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ABSTRACT 

This is the second paper in a series aimed at investigating the main sources of uncertainty in mea¬ 
suring the observa ble paramete rs in galaxies from their Spectral Energy Distributions (SEDs). In the 
first paper (I Dahlen et al.ll2013h we presented a detailed account of the photometric redshift measure¬ 
ments and an error analysis of this process. In this paper we perform a comprehensive study of the 
main sources of random and systematic error in stellar mass estimates for galaxies, and their relative 
contributions to the associated error budget. Since there is no prior knowledge of the stellar mass 
of galaxies (unlike their photometric redshifts), we use mock galaxy catalogs with simulated multi¬ 
waveband photometry and known redshift, stellar mass, age and extinction for individual galaxies. 

The multi-waveband photometry for the simulated galaxies were generated in 13 filters spanning from 
U-band to mid-infrared wavelengths. Given different parameters affecting stellar mass measurement 
(photometric S/N ratios, SED fitting errors and systematic effects), the inherent degeneracies and 
correlated errors, we formulated different simulated galaxy catalogs to quantify these effects individu¬ 
ally. For comparison, we also generated catalogs based on observed photometric data of real galaxies 
in the GOODS-South field, spanning the same passbands. The simulated and observed catalogs were 
provided to a number of teams within the Cosmic Assembly Near-infrared Deep Extragalactic Legacy 
Survey (CANDELS) collaboration to estimate the stellar masses for individual galaxies. A total of 
eleven teams participated, with different combinations of stellar mass measurement codes/methods, 
population synthesis models, star formation histories, extinction and age. 

For each simulated galaxy, the differences between the input stellar masses, M input , and those es¬ 
timated by each team, M est , is defined as Alog(M) = log(M est i m ated) — log(Mi npu t)> and used to 
identify the most fundamental parameters affecting stellar mass estimate in galaxies, with the fol¬ 
lowing results: (1). no significant bias in A log(M) was found among different codes, with all having 
comparable scatter ( a(Alog(M )) = 0.136dex). The estimated stellar mass values are seriously af¬ 
fected by low photometric S/N ratios, with the rms scatter increasing for galaxies with Hab > 26 
mag.; (2). A source of error contributing to the scatter in A log(M) is found to be due to photo¬ 
metric uncertainties (0.136 dex) and low resolution in age and extinction grids when generating the 
SED templates;(3). The median of stellar masses among different methods provides a stable measure 
of the mass associated with any given galaxy (cr(Alog(M)) = 0.142dex); (4). The A log(M) values 
are strongly correlate with deviations in age (defined as the difference between the estimated and 
expected values), with a weaker correlation with extinction; (5). the rms scatter in the estimated 
stellar masses due to free parameters (after fixing redshifts and IMF) are quantified and found to be 
a(Alog(M)) = O.llOdex; (6). Using the observed data, we studied the sensitivity of stellar masses to 
both the population synthesis codes and inclusion of nebular emission lines and found them to affect 
the stellar mass by 0.2 dex and 0.3 dex respectively. 

Keywords: galaxies: distances and redshifts - galaxies: high-redshift - galaxies: photometry - surveys 


1 Department of Physics and Astronomy, University of Cali¬ 
fornia, Riverside, CA 92521 

2 Space Telescope Science Institute, 3700 San Martin Drive, 
Baltimore, MD 21218 

3 Physics Department, CUNY NYC College of Technology, 300 
Jay Street, Brooklyn, NY 11201 

4 UCO/Lick Observatory, Department of Astronomy and As¬ 
trophysics, University of California, Santa Cruz, CA 95064 

3 Department of Astronomy, The University of Texas at 
Austin, Austin, TX 78712 

6 INAF, Osservatorio Astronomico di Roma, Via Frascati 33, 
100040, Monteporzio, Italy 

7 Center for Astronomy and Astrophysics, Observatorio Astro¬ 

nomico de Lisboa, Tapada da Ajuda, 1349-018, Lisboa, Portugal 


8 Department of Astronomy, University of Massachusetts, 710 
North Plesant Street, Amherst, MA 01003 

9 Kavli Institute for Particle Physics and Cosmology, Stanford 
University, Stanford, CA 94305 

10 Department of Physics and astronomy, Texas A AM Re¬ 
search Foundation, College Station, TX 77843 

11 National Optical Astronomy Observatories, 950 N Cherry 
Avenue, Tucson, AZ 85719 

12 Max-Planck-Institut fur extraterrestrische Physik, Giessen- 
bachstrasse 1, D-85748 Garching bei Miinchen, Germany 

13 Department of Physics and Astronomy, Rutgers, The State 
University of New Jersey, 136 Frelinghuysen Road, Piscataway, 
NJ 08854 




2 


1. INTRODUCTION 

The questions of what governs the observed properties 
of galaxies, the reason behind the correlations among 
these properties and how they change with look-back 
time, are among the most fundamental in observational 
astronomy today. This requires accurate measurement 
of redshifts as well as the rest-franre observables. In par¬ 
ticular, detailed knowledge of the statistical properties 
of galaxies (i.e. luminosity and mass functions) at differ¬ 
ent redshifts is essential to constrain current hierarchical 
models for the formation of galaxies. This requires large 
and deep surveys with multi-waveband photometry, pho¬ 
tometric redshifts and stellar mass estimates. 

The installation of wide field-of-view detectors with 
high optical and infrared quantum efficiency on space and 
ground-based observatories has now allowed construction 
of multi-waveband, large and deep galaxy surveys. These 
surveys occupy a large portion of the Area-Depth param¬ 
eter spac e, from the very deep Hubble Ultra Deep Field 
(HUDF; iBeckwith et al.l [2006 1. designed for studies of 
very high redshift gala xies, to wid e-area Co smic Evolu¬ 
tion Survey-COSMOS (iScoville et, al.l l2007l l formulated 
to study the large scale structure in the Universe and its 
evolution with redshift. These are complemented by the 
intermediate surveys (in terms of depth and area) such as 
the Great Observator ies Origins Deep Survey (GOODS; 
iGiavalisco et al.ll2004l l. designed specifically for studies of 
the evolution of galaxies to high redshifts. Recently, the 
wavelength range of these surveys has been extended to 
near-infrared bands in a Multi-cycle Treasury program, 
the Cosmic Assembly Near-inf rared Deep Extragalac- 
tic Legacy Survey- CANDELS dKoekemoer et al.l 12011b 
IGrogin et al.1 1201 il l. One important addition to these 
observations is the availability of deep mid-infrared data 
(3.6-8.0 /rm) from the Spitzer Space Teles cope , exten d¬ 
ing th e observed wavelength range to 8 /iin (lAshbv et al.1 
I2013T) . This is essential in constraining the Spectral En¬ 
ergy Distributions (SEDs) of galaxies and in estimating 
accurate photometric redshifts and stellar masses span¬ 
ning a range of redshifts. 

The multi-waveband data from these surveys are ex¬ 
tensively used to study the luminosity function and mass 
function of gala xies to very high redshifts, with often di¬ 
vergent results (lOuchi et aD 120091: iBouwens et, al.1 1201 lb 
Finkelstein et al.l 120121: Dahlen et alJl2013BMcLure et, al.1 

20131 iSchenker et. al.l l2013bh . This is done through fit¬ 

ting of the observed SEDs of individual galaxies to model 
templates in order to estimate their photometric redshifts 
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or measure rest-frame luminosities. However, there are 
a number of concerns regarding this process. First, this 
requires accurate photometry for galaxies. Given that 
the photometric data points used for the SED fits are 
observed by different telescopes and instruments, with 
different point spread functions (PSFs), one needs to re¬ 
duce them to the same scale (i.e. images with the high¬ 
est resolution). This is to ensure that they are corrected 
so that the ratios of fluxes in different bands refer to 
the same regions of galaxies. Second, this requires clear 
understanding of the accuracy and biases in photomet¬ 
ric redshift and stellar mass measurement. Third, at 
the basic level, different investigators use different tech¬ 
niques, codes, templates and initial parameters to fit the 
observed SEDs and extract observable information from 
them. This alone introduces unknown differences in the 
photometric redshift and stellar mass estimates to the 
same galaxy. The first problem is generally addressed 
by degrading the data to a common PSF, or by fitting 
templates for galaxies from the higher resolution image 
convolved with a kernel to match the PSF of the lower 
resolution ima ges, using the_ Template Fitting (TFIT) 
technique (jLaidler et al. ' 2007 11. This has successfully 
been used to generate self-consistent multiband dataset 
for in dividual galaxies across the wavelengt h range cov¬ 
ered (|Guo et all l2013at iSantini et al.1 120151 Nayyeri et 
al 2015 in preparation). However, there are still out- 
standing issues r egard ing the second and third points. 
In lDahlen et al. i 201 3i. we addressed systematic uncer¬ 
tainties in photometric redshift estimation. In this paper, 
we focus on the stellar mass measurement. 

The most common method for measuring the stellar 
masses of galaxies is to fit their observed SEDs, covering 
the wavelength range UV/optical/infrared, to templates 
generated from the population synthesis models. The 
templates consist of a large grid of model SEDs with a 
range of free parameters, including: redshift, Star Forma¬ 
tion History (SFH), age, prescription for dust extinction 
and metallicity. For any galaxy, the parameters corre¬ 
sponding to the template SED which best fit its observed 
SED (minimum \ 2 ) are associated to that galaxy. Hav¬ 
ing measured the M/L ratio of the galaxy, and knowing 
its absolute luminosity, one could then estimate its stel¬ 
lar mass, as well as other physical parameters. However, 
there is significant degeneracy in this procedure. The fit¬ 
ting techniques do not necessarily yield unique models, 
with various combinations of free parameters leading to 
equally acceptable fits. Furthermore, the final estimate 
of the stellar mass also depends on technical details such 
as the population synthesis models used to generate tem¬ 
plate SEDs, the fitting technique, the code used and the 
S/N ratio of the observed photometric data. Therefore, 
it is important to understand the dependence of the stel¬ 
lar mass on each of these parameters and to disentangle 
the interplay between them. With this in mind, we have 
undertaken an extensive investigation of the sources of 
uncertainty in the stellar mass measurement from broad¬ 
band photometry. The time is ripe for such a study, with 
the availability of the CANDELS data spanning a wide 
range in wavelength. 

We perform two classes of tests: 1). comparison of es¬ 
timated stellar masses with “true” ones generated from 
theoretical mock catalogs and 2). comparison of esti¬ 
mated stellar masses from different codes and methods 
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applied to observational data, where the “true” masses 
are not known. This allows a test of internal consistency 
between different stellar mass methods, aiming to under¬ 
stand sources responsible for the observed divergencies 
between them. The CANDELS data used for this pur¬ 
pose are extremely deep, so the photometry has very low 
measurement errors. 

We generated simulated and real multi-waveband pho¬ 
tometric catalogs of galaxies with known redshifts and 
stellar masses and asked a number of experts within the 
CANDELS team to independently estimate the observ¬ 
able quantities associated with them. We then com¬ 
pared the stellar masses with the “true” values in the 
mock catalogs and the measurements between different 
teams, aiming first to have a realistic estimate of the er¬ 
ror budget and then, to develop a prescription to acquire 
the most accurate stellar mass for individual galaxies. 
Furthermore, we aim to understand parameters respon¬ 
sible for the observed divergencies between different al¬ 
gorithms used for stellar mass measurement. 

Several studies have recently undertaken similar inves¬ 
tigation by addressing the accuracy of predicted phys- 
ical parameters in galaxies using simulated catalog s 
(|Wuvts et al.1 120091 : ILee et all 120121 : IPforr et all I2012lh 
These studies often used one popul ation sy nthesis codes 
to generate model template s (IWuvts et a l. 2 0091) and a 
single SED fitting technique dLonghetti fe Saraccdl2009l ). 
Furthermore, in the fitting process they fit all the free 
parameters simultaneously (i.e. age, metallicity, SFH, 
mass), causing serious degeneracies between the pre¬ 
dicted parame ters. Moreover, they either use a limited 
redshift range (IWuvts et al.l2009|f o r are restricted to cer¬ 
tain galaxy types (ILee et all 120121) a nd are hardly con ¬ 
strained by the o bservational data (jPforr et alj f2012h 
IPforr et all (|2013f ) investigated the dependence of results 
on different population models, used a wider range in 
redshift, and explored the depndence of results on wave¬ 
length _coverage and photometric filters. While they used 
iMarastonl (120051 ) as their population synthesis model, 
they also tested the results from other codes but used the 
same SED fitting method and procedure to estimate the 
parameters, showing serious degeneracies. None of these 
studies explores the dependence of the estimated stellar 
masses on the nebular emission lines , which is proved to 
be significant (Ide Barros et all[20Ml) . 

This paper complements previous studies in various 
ways. It uses ten independent SED fitting techniques 
and codes from different teams and, at the same time, 
explores dependence of each of these results on a variety 
of population synthesis models. Furthermore, the mock 
catalogs generated for this purpose are selected to resem¬ 
ble observed galaxy surveys (i.e. CANDELS) in terms 
of redshift distribution, wavelength coverage and photo¬ 
metric uncertainty, so that the results would be directly 
applicable to the observed data. In addition to simu¬ 
lations, it also uses observed photometry and real data 
to explore the internal consistency of the stellar masses 
measured from different procedures. By fixing the pa¬ 
rameters in the SED fitting process to those of the input 
mock catalogs, we study the degeneracy amongst the pa¬ 
rameters, estimating the errors contributed from each pa¬ 
rameter to the final stellar mass. The present study also 
investigates dependence of stellar mass on nebular line 
emissions. Finally, the errors associated with individual 


physical parameters are estimated and their contribution 
to the total error budget calculated. Results from this 
study are directly used to estimate stellar masses for the 
CANDELS galaxies by finding the technique which leads 
to the most accurate measurement. 

In section 2, we present the procedures and the tests 
designed for this study. In Section 3 the participating 
teams are introduced, with a brief description of the 
methods and techniques used by each team. Sections 
4-7 present different tests and explore sources of uncer¬ 
tainty and bias in stellar mass measurements from differ¬ 
ent teams. Comparison with other similar studies in lit¬ 
erature is performed in section 8, with the error budgests 
estimated and discussed in Section 9. Our conclusions 
are summarized in Section 10. Throughout, we assume 
standard cosmology with Ho = 70 Km/s/Mpc, Dm = 0.3 
and Da = 0.7. Magnitudes are all in the AB system 
(Gunn & Oke 1983). 

2. THE PROCEDURE 

In this investigation, we carry out four different tests, 
designed to explore different types of systematic er¬ 
rors in stellar mass measurement. We estimate stellar 
masses from different catalogs: an empirical mock cat¬ 
alog (TEST-1), a Semi-Analytic Mock catalog (SAM; 
TEST-2) and a “real” observational catalog (TEST-3 
and TEST-4). The main parameters used to generate 
the mock catalogs and the input to stellar mass mea¬ 
surement codes (discussed in section 3) are listed in Ta¬ 
ble 1. In Appendix I, we summarize definitions of the 
stellar masses used in the SAMs in this study and most 
commonly used in literature. TEST-1, developed to eval¬ 
uate different SED fitting codes, fits simulated data for 
galaxies with simple star-formation histories (SFH), us¬ 
ing a limited number of free parameters (this test is 
strongly constrained). The mock catalogs are generated 
to have similar distribution of physical parameters as the 
observed catalogs (presented in Appendix II). TEST- 
2A and TEST-2B fit simulated data for galaxies with 
more complex SFHs drawn from a semi-analytic model. 
TEST-2A fits the mock data, simulated to mimic real 
galaxies as closely as possible. TEST-2B is more con¬ 
strained; there is no dust and fits are restricted to using 
the same evolutionary synthesis code for the fits. TEST- 
3A and TEST-3B compare masses when the same fit¬ 
ting parameters and techniques, used in TEST-2A and 
TEST-2B, are applied to real galaxies. TEST-4 repeats 
TEST-3A using somewhat shallower near-IR data typi¬ 
cal of pre-CANDELS observations. The simulated multi¬ 
waveband mock catalogs used in TEST-2A and 2B were 
generate d with halos extracted from Bolshoi N-body sim¬ 
ulations (jKlvpin et all 120111 : iBehroozi et al.| |2 013l) and 
popu l ated using semi-analy tic models ([Somerville et al.1 
l2008t [Somerville et, al.ll2012t ). The bandpasses and qual¬ 
ity of the photometry in all the tests approximate the 
observed data from the CANDELS. The stellar masses 
provided in the mock catalogs are defined as the mass 
which is directly produced through SED fits. The age 
is defined as the time since the on-set of star formation. 
One of the main sources of error in stellar mass es timates 
is lack of knowledge of the SFHs (e.g. ILee et all I2014T) . 
Nearly all the methods make very simple assumptions 
about this and even when diverse SFHs are allowed, it 
is unclear whether the methods can correctly select the 






































4 


right type of history based on the photometry, given all 
the other uncertainties. The SAMs have a semi-realistic 
mix of complex SFHs (including rising, constant, and 
declining) however, they do not correctly reproduce the 
observed trends between galaxy mass and Star Formation 
Rate (SFR) ie. downsizing. The main characteristics of 
the tests in this section are listed in Table 1, with an 
overall comparison between different tests presented in 
Table 2 and detailed below: 

TEST-1: Test of the consistency of different SED 
fitting codes and techniques. This test is designed 
to study how well different codes can measure the stellar 
masses and if there is any difference originating from the 
codes once we keep all the rest of the parameters fixed. 
To do this, we generate a mock catalog with known in¬ 
put parameters (redshift, stellar ma ss, age and extinc- 
tion) using templates produced from lBruzi lal fc Chariot! 
(j2003f ) population synthesis models. To make the simu¬ 
lated galaxies comparable to the real data, we add noise 
to their photometry. The parameters used to generate 
the mock SEDs are listed in Table 1. There are a total 
of 559 galaxies in TEST-1 mock catalog. The total num¬ 
ber of simulated galaxies in TEST-1, and the distribu¬ 
tion of their physical parameters are taken to be close to 
the real spectroscopic catalog in GOODS-S field, used to 
calibrate photometric redshifts and the SED fitting tech¬ 
niques. This allows results from TEST-1 to be directly 
applicable to observations. Details about the TEST-1 
mock catalog and distribution of the observable param¬ 
eters are given in Appendix II. 

The masses are derived by fixing the template SEDs to 
have the input values (listed in Table 1) and ONLY fit¬ 
ting for two quantities: the age of the star formation and 
color excess {E(B — V)). The age is defined as the time 
since the on-set of star formation (assuming an exponen¬ 
tially declining SFR with a fixed t) and was constrained 
between 10 Myr and the age of the Universe at the red- 
shift of the particular galaxy under consideration. The 
allowed range for the color excess, E(B — V), is taken 
to be between 0 and 1. The redshift for each galaxy 
was fixed to its input value. Since the majority of the 
parameters affecting the stellar mass measurement are 
fixed, the only difference between the estimated stellar 
masses from the SED fits ( M est /M @) and the expected 
stellar masses (M input / Mq) is due to differences between 
the codes and the SED fitting techniques used between 
different teams. 

TEST-1 is based on a set of 13 filters consisting of: 
U-band (VIMOS), optical- F/35W, F606W, F775W, 
F850LP (ACS); near-infrared F105W, F125W, F160W 
(WFC3); Hawkl K-band (VLT) and Spitzer/ IRAC 3.6, 
4.5, 5.8 and 8 fin i. The selection criteria for galaxies in 
TEST-1 include: a) S/N > 5 in the H-band (the selec¬ 
tion band in the simulated catalog); b) Detected with 
S/N > 1 in at least six passbands: c) 0 < 2 < 4. 
TEST-2: Test of the sensitivity of the stellar mass 
estimates to the free parameters. 

This test is developed to study the effect of free pa¬ 
rameters on the stellar mass measurement. It uses 
SAM catalogs containing 10,000 galaxies with known 
multi-waveband photometry, input mass, age, extinc- 
tion and metallicity. Th e SEDs were constructed using 
iBruzual fc Chariot 120031- BC03 ) mod els, with a mod- 
ified version of iCharlot fc Falll (120001) prescription for 


dust treatment as discussed in iSomerville et all (I2012T) . 
Stellar mass and chemical evolution are calculated as¬ 
suming instantaneous recycling. The SFHs are diverse, 
consisting of exponentially declining, constant and ris¬ 
ing. Redshift distribution for galaxies in TEST-2 catalog 
closely follow the photometric redshift distribution in the 
GOODS-S field (Appendix II). 

For all the galaxies in TEST-2 mock catalogs, photom¬ 
etry is provided in 13 filters: U-band (VIMOS), optical- 
F435W, F606W, F775W, F850LP (ACS); near-infrared 
F105W, F125W, FI60W (WFC3); Hawkl K-band (VLT) 
and Spitzer/TRAC 3.6, 4.5, 5.8 and 8 fim. The selection 
criteria for the mock catalog here are: a) S/N > 5 in the 
H-band (the selection band in the simulated catalog); b) 
Detected with S/N > 1 in at least six passbands; c) 
z < 6. TEST-2 is performed in two stages: 

TEST-2A: The mock catalog here is generated using 
a diversity of SFHs (exponentially declining, rising and 
constant) and metallicities. The data generated for this 
catalog have dust extinction applied to the photometry 
and hence, closely resemble the observations. To esti¬ 
mate the stellar masses, the participating teams were 
not restricted and were free to choose any template from 
any stellar population synthesis code, SFH, metallicity 
and extinction law to fit the mock SEDs. The only lim¬ 
itation was to use Chabrier IMF and to fix the redshift 
of the galaxy to its input value in the mock catalog. 
TEST-2B: Unlike TEST-2A, the mock catalog here is 
generated by constraining the free parameters. The SFH 
associated with template SEDs is fixed to an exponen¬ 
tially declining form, with the templates produced from 
BC03 with solar metallicity. No dust extinction is ap¬ 
plied to photometry in the mock catalog and therefore, 
TEST-2B is not representative of the “real” population of 
galaxies. Redshift is fixed to its input value and Chabrier 
IMF is used. The participating teams were asked to use 
the same input parameters as the ones used to generate 
the mock data. 

Comparison between the stellar mass estimates from 
TEST-2A and TEST-2B will therefore reveal the effect 
of free parameters and degeneracy in the SED fitting 
process. 

TEST-3: Comparison of the stellar mass mea¬ 
surements using real data 

Having estimated the sources of scatter in stellar mass 
measurements due to different codes (TEST-1) and due 
to degeneracy and interplay between the free parameters 
(TEST-2), we now apply the methods on a sample of 
observed SEDs with accurate multi-waveband data and 
available spectroscopic redshifts , selected f rom the TFIT 
catalog in the GOODS-S fi eld dGuo et al.ll2013bf) . This 
is the same sample used in iDahlen et, al.l (|2013T l to cali¬ 
brate the templates for measuring photometric redshifts. 
A total of 598 galaxies were used for this test. For the 
SED fits, the galaxy redshifts were fixed to their spec¬ 
troscopic values. Unlike TEST-1 and TEST-2, where we 
used simulated photometric catalogs and hence, had es¬ 
timates of the “true” stellar mass, here we do not have 
any absolute measure of the stellar mass and the compar¬ 
ison is only relative, measuring the consistency between 
different approaches. 

The photometry for TEST-3 is perform ed on the real 
data using the TFIT technique dGuo et al.ll2013bl ) and 
consists of: U-band (VIMOS), optical- F/35W, F606W, 
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F775W, F850LP (ACS); near-infrared- F098M, F105W, 
F125W, F160W (WFC3); Ks (VLT/ISAAC) and mid- 
infrared Spitzer/IRAC 3.6, 4.5, 5.8 and 8 pm. The 
F098M is only available for the Early Release Survey 
part of the GOODS-S, while F105W is only available 
for a sub-area of that field. TEST-3 is also done in two 
stages: 

TEST-3A: In this test no restriction was imposed on 
the free parameters when generating the template SEDs 
for the fits, except for the redshifts which were fixed to 
their spectroscopic values and the IMF which was cho¬ 
sen to be Chabrier (Table 1). The stellar masses were 
subsequently estimated from different methods (section 
3) and compared with each other. 

TEST-3B: This is the same as TEST-3A with addi¬ 
tional restrictions imposed on the free parameters. This 
will show how close the results from different teams 
would be when some of the parameters in the SED fits 
are not allowed to vary. Therefore, it indicates the effect 
of the free parameters (and their possible interplay) in 
stellar mass measurements. 

TEST-4: Tests the effect of selection wavelength 
and near-infrared photometric depth on the stel¬ 
lar mass measurement. 

This is similar to TEST-3A with the only difference 
being the use of much shallower near-infrared data and 
selection in ACS z-band. This test is designed to examine 
the way different codes treat shallow infrared data and 
its effect on stellar mass measurement. As is often the 
case in galaxy surveys, due to smaller size of near-IR 
detectors, their lower sensitivity and the effect of sky 
brightness, the near-IR data are not as deep as their 
optical counterparts. We designed TEST-4 to examine 
the sensitivity of stellar mass on these parameters. 

3. THE SED FITTING TECHNIQUES AND STELLAR 
MASS MEASUREMENT 

The catalogs discussed in section 2 were provided to 
the CANDELS team members. Using the instructions 
for different tests (Table 1), the teams were asked to 
predict the stellar masses for galaxies in the catalogs, 
satisfying the requirements for each TEST. To perform 
this as objectively as possible, the M input values in the 
mock catalogs were not revealed to the participants. 

A total of ten teams participated in this exercise (not 
all the teams participated in all the tests). In many 
cases, the codes and templates used to measure the stel¬ 
lar masses were different from those used for the photo¬ 
metric redshifts in Dalilen et al (2013). Below, details of 
the codes and the assumptions when applied on TESTs 
2A, 3A and 4 are described (in these tests the partici¬ 
pants were free to choose templates from any population 
synthesis models or any SFH). For TESTs 1, 2B and 3B, 
all the methods used BC03 evolutionary synthesis mod¬ 
els and exponentially declining star-formation histories. 
Details are also listed in Table 2. Where possib le, we use 
t he sa me identification for the teams as in iDahlen et al.1 
». 

Acqu aviva (l.A)- GalMC code (jAcauaviva et al.1 
[201 ll ): The algorithm is a Markov Chain Monte Carlo 
(MCMC) sampler based on Bayesian statistics. In this 
approach, the SED fitting parameters (age, mass, red¬ 
dening, and e-folding time for r models) are treated as 
random variables. The parameter space is explored with 


a random walk biased so that the frequency of visited lo¬ 
cations is proportional to the posterior Probability Den¬ 
sity Function (PDF). The desired intervals of the SED 
fitting parameters are then obtained by marginalizing the 
PDF, which in MCMC simply corresponds to summing 
over the points in the chai ns. Here, a new versio n of the 
GalMC is used (SveedvMC: lAcauaviva et a! :20ia. which 
is 20,000 times faster and at every step of the chain, the 
spectra are generated through multi-linear interpolation 
of a library of pre-computed models. The best-fit stel¬ 
lar masses and the 68% uncertainties are predicted from 
these marginalized distributions. 

For TESTs 2A, 3A and 4, this code used templates 
based on Chariot and Bruzual (2007- CB07) while for 
other tests it used BC03 population synthesis models. 
Two metallicities are used: Solar and 0.2 Zq with the 
one giving the optimum % 2 value chosen. 

Finkelstein (4.B)- own code: This uses x 2 fitting 
method with CB07 population synthesis model. It uses 
a hybrid SFH (exponentially declining + rising star for¬ 
mation rate). 

Fontana (6.C)- own code: This uses x 2 fitting method 
with the SED templates generated from BC03 and ex¬ 
ponentially declining SFR. The templates are generated 
with both Calzetti and SMC dust models and hence, the 
code can choose between the two dust extinction scenar¬ 
ios, whichever gives a better fit. 

Gruetzbauch (7.D)- EAZY code- iBrammer et al.l 
(j2008T ): Uses \ 2 fitting method with BC03 and an ex¬ 
ponentially declining SFR with a large set of r values. 
Also uses a large set of metallicit y and ext i nctio n values. 
Johnson (8.E)- SATMC code- I Johnson! (|2013lh Uses 
the MCMC to fit the SEDs, similar to method l.A. BC03 
templates are used with instantaneous burst of star for¬ 
mation. This is the only experiment which uses this SFH. 
For the fit, all the parameters in the code are varied. 
Papovich (9.F)- own code: This is a x 2 minimiza¬ 
tion code. It uses an exponentially declining SFR. Solar 
metallicity is assumed. The code uses templates based 
on BC03 models. 

Pforr (10.G)- HYPER.Z code iBolzonella et al.l (j2000f l: 
This is a x 2 minimization code. It uses hybrid SFH con¬ 
sisting of exponentially declining, truncated and constant 
SFRs. In this respect, 10.G is different from many of 
the methods listed in Table 3 but is similar to ot hers 
eg. 4.B). This is the only method which uses iMarastonl 
2005) population synthesis code to ge nerate templates. 
Salvato (ll.H)- La Phare code lArnouts fe Ilbertl 
mm-- This uses a x 2 minimization technique and BC03 
code to generate templates. Exponentially declining star 
formation rate is used. The prior E{B — V) < 0.15 is 
applied if the ratio t/r > 4 (i.e. significant extinction is 
only allowed for galaxies wit h high SFR). 

Wiklind (12.1)- own code- iWiklind et ah! (|2008fb Uses 
X 2 minimization of the SEDs. The errors in stellar mass 
are estimated from Monte Carlo simulations. Exponen¬ 
tially declining star formation rates are used with r = 0 
representing an instantaneous burst. The template SEDs 
are based on BC03. 

Wuyts (13. J)- FAST code lKriek et al.l (120091) : Uses x 2 
fitting technique with exponentially declining SFR. The 
templates are from BC03 with solar metallicity. 

Details of individual methods are listed in Table 3. In 
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Table 1 

Details of different TESTs developed for stellar mass measurement. The listed parameters are used to generate the mock catalogs and as 

inputs in the codes discussed in section 3 to measure stellar masses 


TEST-1: 

Fixed Parameters 

IMF: Chabrier (limits: 0.1 < M/M© < 100) 

Redshift range 0 < £ < 4 

Stellar population templates: Bruzual and Chariot 2003 (BC03). 

SFH: Single burst Exponentially declining, r fixed at 0.1 Gyr. 

Gas recycling: no. 

Dust extinction law: Calzetti. 

IGM Absorption: Madau law, flux set to zero at A < 912A (restframe). 

Metallicity: Z = Zq 

Nebular Emission: not included. 

Free Parameters 

Age between 10 Myr and the Age of the Universe at the redshift 
of the galaxy. 

E(B — V ) between 0 and 1. 

TEST-2A 
Fixed parameters 

Chabrier IMF (0.1 < M/M© < 100) 

Redshift range 0 < z < 6 
Redshift is fixed to its input value. 

Dust extinction is applied to the photometry in the mock catalog. 

Free parameters 

1 SFH, metallicity, extinction, population synthesis code 

TEST-2B 
Fixed Parameters 

Templates: BC03 with Chabrier IMF (0.1 < M/M© < 100) 

Redshift range 0 < z < 6 

1 SFH: exponentially declining SFR 

no extinction applied to the photometric points; E(B-V)=0 
Metallicity: Solar 
Emission lines: not included 

Redshift fixed to the provided value in the mock catalog 

Free parameters 

The exponential time scale r and the age of the stellar population. 

TEST-3A 
Fixed Parameters 

Observed F160W band selected multi-band TFIT photometric catalog for GOODS-S 

The objects are fixed at their spectroscopic redshifts 

Redshift range 0 < z < 6 

Chabrier IMF (0.1 < M/M© < 100) 

Free Parameters 

SFH, metallicity, extinction, population synthesis code, stellar mass, age 


TEST-3B 
Fixed Parameters 


Free Parameters 
TEST-4 


Observed F160W band selected multi-band TFIT photometric catalog for GOODS-S 
Templates: BC03 [with Chabrier IMF (0.1 < M/M© < 100)] 

Extinction: E(B-V)=Av=0, i.e., no extinction 
1 SFH: Exponentially declining 
Metallicity: Solar 
Redshift range 0 < £ < 6 

stellar mass, star formation time-scale, r, age 


The same as TEST-3A but selected in ACS z-band, with shallower observed near-infrared data 

Note. — x The SAMs use a diversity of SFHs depending on the host halo merger history. Therefore, the SFH of every mock galaxy is 
fixed. The forms of the SFHs adapted here are used to generate the SED templates for the stellar mass measurement methods. 


the next section we compare the input mass with the 
stellar mass estimates independently measured from dif¬ 
ferent methods (Tables 2 and 3) to explore differences 
as a function of the method (TEST-1), free parameters 
(extinction, star formation history, age)-(TEST-2), tem¬ 
plates used and internal consistency (TEST-3) and the 
photometric depth and selection wavelength (TEST-4). 
This allows a study of the absolute consistency (i.e. how 
well each code produces the expected mass) and relative 
consistency (how the estimated masses between differ¬ 


ent codes agree). In the following sections, we perform 
a step-by-step study of the above, using the information 
in Tables 1 and 3. 

4. TEST-1: COMPARISON OF STELLAR MASSES 
FROM DIFFERENT METHODS 

4.1. Dependence on the SED Fitting Codes 

The participating teams, listed in Table 3, used the 
mock catalog generated for TEST-1 and independently 
estimated the stellar mass for individual galaxies. For 
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Table 2 

Table shows the list of the parameters in the SED fitting methods which are kept fixed to values listed in Table 1 or are left free in the fit. 


Parameters 

TEST-1 

TEST-2A 

TEST-2B 

TEST-3A 

TEST-3B 

TEST-4 

Star Formation History 

Exp. Declining 

Free 

Exp. Declining 

Free 

Exp. Declining 

Free 

Population Synthesis Models 

BC03 

Free 

BC03 

Free 

BC03 

Free 

T 

Fixed 

Free 

Fixed 

Free 

Fixed 

Free 

IMF 

Fixed 

Fixed 

Fixed 

Fixed 

Fixed 

Fixed 

Redshift 

Fixed 

Fixed 

Fixed 

Fixed 

Fixed 

Fixed 

Extinction 

Fixed 

Free 

None 

Free 

None 

Free 

Age 

Free 

Free 

Free 

Free 

Free 

Free 

Metallicity 

Fixed 

Free 

Fixed 

Free 

Fixed 

Free 

Nebular Emission 

No 

No 

No 

No 

No 

No 

IGM Absorption 

Fixed 

Fixed 

Fixed 

Fixed 

Fixed 

Fixed 


each code, Figure 1 shows changes between the input 
mass, log (M input ), and A (logM), defined as the dif¬ 
ference between the input mass and stellar mass esti¬ 
mated from that code, M est : A(logM) = log(M input ) — 
log{M est ). The very small scatter in the case of l.A is 
to be expected because TEST-1 mock catalog was gener¬ 
ated based on this method and therefore it confirms the 
consistency between the input and estimated masses. As 
a result, the observed scatter in the stellar mass from 
method l.A is likely due to the effect of photometric er¬ 
rors added to the mock data. This is supported by the 
results from Figure 2a which shows an increase in the 
scatter in A log(M) based on method l.A from bright to 
faint magnitudes (see below). Figure 1 also confirms that 
all the methods used in this experiment recover the input 
mass values to good accuracy. There are no systematic 
effects or mass-dependent biases, indicating that none of 
the methods in Table 3 is significantly biased. 

It is clear from Figure 1 that for most of the meth¬ 
ods, the scatter reduces towards the higher mass end 
(M > 10 10 Mq). This is because these galaxies are of¬ 
ten brighter with a higher photometric S/N ratios. This 
is demonstrated in Figure 2a, where we study changes 
in A log(M) as a function of H-band (F160W) magni¬ 
tude, showing an increase in the scatter at Hab > 26 
mag. This indicates that the main source of inconsis¬ 
tency between the stellar mass estimates among differ¬ 
ent codes, keeping everything else the same, is for the 
relatively fainter galaxies (and those with lower photo¬ 
metric S/N ratios) and due to different ways the pho¬ 
tometric errors are handled in the SED fitting process. 
Figure 2b shows the change in A log(M) as a function 
of redshift where, for most of the methods, we find no 
correlation and hence, no redshift-dependent biases. The 
exception is 7.D where shows a bias at z < 1. The rea¬ 
son for the observed redshift-dependent bias here is not 
clear but is likely due to degeneracy caused by using a 
wide range of metallicities. The redshift distribution in 
TEST-1 catalog, presented in Appendix II, is fixed to 
be the same as the observed (spectroscopic) distribution 
for the GOODS-S field. Therefore, the results from this 
study are directly applicable to the observed samples. 

For each method, we estimated the ms, outlier frac¬ 
tion and the bias in A log(M) values using galaxies in 


the mock catalogs (i.e the scatter in A log(M) from in¬ 
dividual methods among all the galaxies). Results are 
listed in Table 4. The outlier fraction is defined as the 
ratio of the number of galaxies with \Alog(M)\ > 0.5 
to the total number of galaxies while the bias factor is 
defined as mean[Alog(M)j. Overall, there is good agree¬ 
ment between the estimated masses from different meth¬ 
ods and the input mass. The rms values range from 
0.141 dex (13.J) to 0.241 dex (11.H). The highest rms 
values and outlier fractions are for methods 7.D, 8.E and 
ll.H. Method 8.E uses the MCMC technique, which is 
different from what used in other methods (except for 
method l.A). Both 7.D and ll.H also show higher bi¬ 
ases (—0.059 and 0.057 respectively), contributing to the 
relatively higher rms scatter. All these codes have rela¬ 
tively low resolution E(B — V) and age grids. The lowest 
rms scatter is associated with codes: 6.C, 10.G and 13.G 
which have a relatively higher resolution in E(B — V) 
and age grids. 

Figure 3 shows changes in the rms , outlier fraction and 
bias as a function of the S/N. There is a clear reduction 
in the rms and outlier fractions with increasing S/N ra¬ 
tio. However, the estimated bias from all the methods is 
found to be independent of the S/N ratio, with signifi¬ 
cant reduction in the bias when the outliers are removed. 
This supports our earlier conclusion that some of the dif¬ 
ferences in the stellar mass measurements from different 
methods could be attributed to low S/N ratios in the 
photometric data. 

We note that there is good agreement between the 
input and estimated stellar mass values when using 
the median of stellar masses measured for individ¬ 
ual galaxies from different methods ( M me< i ). The 
rms in A log(M me< i) is 0.142 dex where A log(M me d) = 
log(Mi nput ) — log(M me d). However, the median will be 
affected if some of the methods are biased. As shown in 
Table 4 and Figures 2a and 2b, for most of the methods, 
there is no indication of significant bias in the masses. 
Since the same input parameters are used for all the ex¬ 
periments in TEST-1, the median of the mass estimates 
for each galaxy measured from different methods is less 
affected by code-dependent uncertainties. Therefore, the 
smaller rms for the median suggests that the numerical 
noise (presumably due to different approximations and 



Table 3 

Details of the methods and parameters used for stellar mass measurement 


Method l.A 

Team ID: 1 
PI: Acquaviva 
Code ID: A 

Code: GalMC (Ac^u^iva^^aL ^01^) 

Fitting Method: MCMC 

Stellar Population Synthesis Templates: CB07 or BC03 (see the text) 

Star Formation History: Constant 

Extinction law: Calzetti, E(B — V) = 0.0 — 1.0 

Ages: 10 6 -1.4 x 10 10 yrs 

Nebular emission: yes 

Metallicity: Zq and 0.2 Zq 

Method 4.B 

Team ID: 4 
PI: Finkelstein 
Code ID: B 
Code: own code 
Fitting Method: % 2 

Stellar Population Synthesis Templates: CB07 

Star Formation History: Exponentially declining, rising (r = 0.0001, 0.01, 0.1,1.0,100.0, —0.3, —1.0, —10.0) Gyrs 
(the negative values correspond to a rising SFR) 

Extinction law: Calzetti, E(B — V) = 0.0 — 0.51 
Ages: lMyr - 13Gyrs 
Nebular emission: yes 
Metallicity: Zq and 0.2Z© 

Method 6.C 

Team ID: 6 
PI: Fontana 
Code ID: C 
Code: own code 
Fitting Method: % 2 

Stellar Population Synthesis Templates: BC03 

Star Formation History: Exponentially declining with r = 0.1, 0.3, 0.6,1, 2, 3, 5, 9,15 Gyrs 

Extinction law: Calzetti+SMC, E(B — V) = 0.0 — 1.1 in increments of 0.05 

Ages: log(age) = 7 — 7.35 (in 0.05 steps), 7.4 — 8.9 (0.1 steps), 9 — 10.3 (0.05 steps) 

Nebular emission: no 

Metallicity: 0.2Z©, OAZq, Zq, 2.5 Zq also a subset of models with age < 1 Gyrs and Zq = 0.02 

Method 7.D 

Team ID: 7 
PI: Gruetzbauch 
Code ID: D 

Code: EAZY Brammer et al. (2008) 

Fitting Method: x 2 

Stellar Population Synthesis Templates: BC03 

Star Formation History: Exponentially declining with r = 0.01, 0.03, 0.06, 0.1, 0.25, 0.5, 0.75, 1.0, 1.3, 1.7, 2.2, 2.7, 
3.25, 3.75, 4.25, 4.75, 5.25, 5.75, 6.25, 6.75, 7.25, 7.75, 8.25, 8.75, 9.25, 9.75, 10.25, 10.75 Gyrs 
Extinction law: Calzetti, A v — 0., 0.2, 0.4, 0.6, 0.8,1.0,1.33,1.66, 2, 2.5 

Ages: lMyr - 13 Gyrs- ages required to be smaller than the age of the Universe at each redshift 
Nebular emission: no 

Metallicity: (X, Y, Z): (0.7696, 0.2303, 0.0001), (0.7686, 0.231, 0.0004), (0.756, 0.24, 0.004), 

(0.742, 0.25, 0.008), (0.70, 0.28, 0.02), (0.5980, 0.352, 0.0500), (0.4250, 0.475, 0.1000) 


interpolations made in the fitting codes) is reduced by 
combining results from different methods. This numer¬ 
ical noise is small compared to other systematic uncer¬ 
tainties, so the gain from taking the median rather than 
using a single, well tested, code is likely to be useful only 
when values based on the same underlying set of assump¬ 
tions are desired. 

The rms scatter measured for the stellar masses in 
TEST-1 are based on galaxy samples which cover a range 
in luminosities and photometric S/N ratios and also 
methods which handle these errors differently. This also 
contributes to the rms values in Table 4. To quantify this, 
we measure the rms in A log(M) for individual galaxies 
in the mock catalog from each method separately. In this 
case, the rms in logA(M), estimated for each galaxy from 


different methods, represents the genuine scatter among 
different codes/methods, only depending on the way each 
code/method treats the photometric error. Changes in 
the rms scatter as a function of the S/N ratios for galax¬ 
ies in TEST-1 is presented in Figure 4. Given that for a 
single galaxy the photometric errors are fixed, the rela¬ 
tion between the rms values in A log(M) (corresponding 
to individual galaxies as measured from different codes) 
and the S/N ratios reveals the extent to which handling 
the photometric errors by each code affects the resulting 
stellar mass estimates. The rms reduces with increas¬ 
ing the S/N and asymptotes around rms = 0.05dea; (at 
S/N > 40), where the photometric uncertainties become 
very small. This gives a measure of the systematic ef¬ 
fects in the stellar mass measurement entirely due to the 
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Method 8.E 

Team ID: 8 
PI: Johnson 
Code ID: E 

Code: SATMC (Johnson et al 2013) 

Fitting Method: MCMC 

Stellar Population Synthesis Templates: BC03 
Star Formation History: Instantaneous burst 
Extinction law: Calzetti, E(B — V) = 0.0 — 4.5 

Ages: 0.01 — 10 Gyr (unequally spaced and taken directly from BC03 library) 

Nebular emission: no 

Metallicity: 0.0001Z©, 0.0004Z©, 0.004Z©, 0.02 Zq 0.05Z© 

Method 9.F 

Team ID: 9 
PI: Papovich 
Code ID: F 
Code: Own Code 
Fitting Method: % 2 

Stellar Population Synthesis Templates: BC03 

Star Formation History: Exponentially declining (r = 0.001, 0.01, 0.03, 0.1, 0.3,1.0, 3.0,10.0,100.0 Gyr) 
Extinction law: Calzetti 

Ages: 0.0251, 0.04, 0.064, 0.1015, 0.161, 0.255, 0.6405, 1.0152, 1.609, 2.5, 4.0, 6.25 and 10.0 Gyrs 
Nebular emission: no 

Metallicity: Solar, except for TEST-2A for which the following are used: 0.02, 0.2, 0.4, 1.0, 2.5 Zq 

Method 10.G 

Team ID: 10 
PI: Pforr 
Code ID: G 

Code: HyperZ (Bolzonell^et^L (2000) 

Fitting Method: \ Z 

Stellar Population Synthesis Templates: M05 

Star Formation History: Exponentially declining (t = 0.1, 0.3,1.0 Gyr), Constant SF at t = 0.1, 0.3,1 Gyr, 
zero SF afterwards, Constant star formation 
Extinction law: none 

Ages: 0 — 20 Gyr (221 in total, grid as in BC03 templates) 

Nebular emission: no 

Metallicity: 0.2 Zq, 0.5 Zq, 1.0 Zq, 2Zq 

Method 11.H 

Team ID: 11 
PI: Salvato 
Code ID: H 

Code: Le Phare Arnouts Sz Ilbert (201^) 

Fitting Method:™ 2 "" 

Stellar Population Synthesis Templates: BC03 

Star Formation History: Exponentially declining (r = 0.1, 0.3,1, 2, 3, 5,10,15, 30 Gyr) 

Extinction law: Calzetti 
Ages: 0.01 - 13.5 Gyr 
Nebular emission: yes 
Metallicity: 0.02 Zq, 0.008 Zq 


Method 12.1 

Team ID: 12 

PI: Wiklind 

Code ID: I 

Code: Own Code _Wik l ind et al. (2008) 

Fitting Method: " 

Stellar Population Synthesis Templates: BC03 

Star Formation History: Exponentially declining (r = 0.1, 0.2, 0.3, 0. 

Extinction law: Calzetti 

Ages: 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 

Nebular emission: no 

Metallicity: 0.2 Zq, 0.4Z©, 1.0Z©, 2.5Z© 

methods used (when all the rest of the parameters are 
fixed and photometric errors are negligible). 

The rms in A log(M), estimated for method l.A (Ta¬ 
ble 4), is mainly due to contribution from photometric 
errors in stellar mass measurement and not the method 
or the SED templates used (because the template SEDs 
in TEST-1 were generated by this code and were used 
again to estimate the observable parameters after in¬ 
troducing photometric noise to the SEDs). Therefore, 
we estimate the intrinsic uncertainty in A log(M) as- 


.4, 0.6, 0.8,1.0 Gyr) and instantaneous burst (r = 0) 
2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 6.0, 7.0 Gyrs 


sociated with each method (in Table 4), <J me thod,ii as 
yj of — <71 A where, ai and <J\.a are the rms values for 
individual methods and for method f.A (correspond¬ 
ing to photometric uncertainties) respectively. Here we 
assume that differences in a due to treatment of age 
and extinction among different codes is negligible (how¬ 
ever, see the next section). The total uncertainty in 
A log(M) due to differences in codes/methods used is 

therefore, a m = ^^i=iCr 2 method:i = 0.136dex, where 
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Method 13.J 

Team ID: 13 
PI: Wuyts 
Code ID: J 

Code: FA^TJKrieket^aL (2009) 

Fitting Method: % 2 

Stellar Population Synthesis Templates: BC03 

Star Formation History: Exponentially declining (log(r)=8.5-10 in increments of 0.1 
Extinction law: Calzetti 

Ages: \og(age)=7 .7 to 10.1 in increments of 0.1 
Nebular emission: no 
Metallicity: solar 




<u _I_I_ l_. _I_I__I_I_I_I_I_ 

I ^ 

6 7 8 9 10 11 6 7 8 9 10 11 12 

log(M lnput /M 0 ) log(M lnput /M 0 ) 

Figure 1. The stellar mass difference (A log(M)) as a function of log (Mi npu t) measured from all the participating methods, using TEST-1. 
A log(M) is defined as A log(M) = log(Mi npu t) — log(M es t) where M es t is the estimated stellar mass. The red horizontal line shows the 
expected relation if the input stellar mass is exactly produced. A total of 559 simulated galaxies are used. This test examines the sensitivity 
of the stellar mass to the methods/codes listed in Table 3. 

n is the number of methods/codes used. Using the me¬ 
dian rms value from all the methods (Table 4), we es¬ 
timate yj cr^ edion ~ °i A = 0.047dex. This is close to 
rms = 0.050 dex we estimated for systematic errors from 
Figure 4 and is significantly smaller than the rms scatter 
of 0.136 dex due to different methods/codes used. This 
confirms that the median mass (among all the meth¬ 
ods/codes) provides the closest estimate to the “real” 
stellar mass. 

A Spearman Ranking Test was performed between the 
input, Mi npu t, and estimated mass, M est , for each galaxy 
as measured by applying different codes on the mock 
sample in TEST-1. Combined with the Pearson correla¬ 


tion coefficients from this test, as listed in Table 5, this 
confirms very close ranking of the stellar masses mea¬ 
sured from different codes (i.e. the codes consistently 
produce the mass sequence for galaxies in the catalog). 

We conclude that the uncertainties in the estimated 
stellar mass are dependent on the resolution of color 
excess (Eb-v) an d a 9 e grids as well as the photomet¬ 
ric S/N ratios. We find an rms scatter of 0.135dex 
in Alog(M) due to code-dependent effects. The esti¬ 
mated uncertainty in log(M) due to photometric errors 
is 0.134 dex while using the median mass, it reduces to 
0.05 dex. No evidence is found for bias in any of the 
methods in Table 4- For each galaxy, the median stellar 
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Figure 2. 2a)- Top Panel: Dependence of A log(M) on H-band ( F160W) magnitudes in TEST-1, showing sensitivity of A log(M) on the 
photometric S/N. (2b)-Bottom Panel: The same as 2a but for redshift.All the input parameters in this test are fixed with the only variable 
being the codes/methods. These show the sensitivity of the stellar mass on the codes over the range of magnitudes and redshifts covered. 
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Table 4 

The rms scatter, bias and outlier fraction (OLF) in A log(M) for mock sample in TEST-1. 


Code 

rms 

rms 

bias 1 

outlier 1 



no outliers 


fraction 

l.A 

0.134 

0.100 

0.026 

0.014 

4.B 

0.175 

0.127 

0.047 

0.030 

6.C 

0.167 

0.124 

0.023 

0.021 

7.D 

0.214 

0.177 

-0.059 

0.040 

8.E 

0.228 

0.164 

0.010 

0.049 

9.F 

0.180 

0.110 

0.024 

0.034 

10.G 

0.172 

0.123 

0.000 

0.026 

11.H 

0.241 

0.172 

0.057 

0.060 

12.1 

0.181 

0.129 

-0.014 

0.032 

13.J 

0.141 

0.106 

-0.018 

0.015 

Median 

0.142 

0.089 

0.010 

0.024 


Note. — 1 Outlier fraction is defined as the ratio of the number of galaxies with A (log(M)) > 0.5 to the total number of galaxies where 
A log(M) = log(Mi npu t) — log(M e st)- The bias is defined as mean[Alog(M)] 




Figure 3. Shows changes in the rms (left), outlier fractions (middle) and bias (right) for different methods in TEST-1, as a function of 
the photometric S/N ratios. Different colors represent estimates for the whole sample (black line), with the outliers removed (green line) 
and those corresponding to the median mass (red lines.). The S/N ratio is measured from the F160W band. 
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Figure 4. Changes in the rms with S/N values. The rms is the scatter in stellar mass values (Alog(M)) for individual galaxies in TEST-1, 
based on different methods. The filled circles are the median values in S/N bins with the errorbars corresponding to Poisson statistics. 
The scatter at a given S/N represents the dispersion in the stellar mass values among different methods. 


mass between different methods gives the most accurate 
stellar mass with the errors mainly dominated by system¬ 
atic effects. 

4.2. TEST-1: The Effect of Age and Extinction 
on Stellar Mass Estimates 

A serious problem in stellar mass measurement for 
galaxies through SED fitting is the interplay between 
the mass, age and extinction, leading to correlated errors 
among these parameters. The problem is compounded by 
the fact that there is no direct and model-independent 
measure for these parameters, although there is indepen¬ 
dent constraint on exti nctio n with mid to far-IR dust 
measurements (eg. iReddv et al.ll2012h . which narrows 
the range of allowed age and extinction values. There¬ 
fore, the only way to constrain them is through simula¬ 
tions, where we know apriori the input values for each 
galaxy. The mock catalog in TEST-1 also provides the 
input age and extinction for each simulated galaxy, pro¬ 
viding a reference with which to compare their predicted 
values. In this section we study the uncertainty intro¬ 
duced to the estimated stellar mass values due to the 
interplay between age and extinction. Here age is de¬ 
fined as the time since the on-set of star formation and 
an exponentially declining SFH is assumed. 

Figure 5 shows the dependence of Alog(M) on the in¬ 
put (expected) age and extinction {E( B _ V A for differ¬ 
ent methods. The sample is divided into three different 
age and E B -v bins (corresponding to their input val¬ 
ues from the simulation). On the A log(M)-E B _y and 
A log(M)-log(age) plots, these respectively correspond 
to: 7 < log{age) < 8 (blue); 8 < log{age) < 9 (black); 
9 < log(age) < 10 (red) and 0 < E B _y < 0.3 (blue); 
0.3 < E b _v < 0-6 (black); E B _y > 0.6 (red). There is 
significant scatter in A log(M) at a given age or extinc¬ 
tion interval. As expected, the number of old galaxies 
(age > 10 9 yrs) with high extinction is small. In partic¬ 
ular, the scatter is higher for younger galaxies, indepen¬ 
dent of the extinction. 

For some of the models (6.C, 8.E, 10.G, 11.H, 12.1) 
in Figure 5, we find a sequence of galaxies with ages 


< 10 s yrs clearly separated from the A log(M) = 0 line. 
These galaxies all have wrong stellar masses (i.e. large 
A log(M) values). Furthermore, this does not depend 
on a particular code and SED fitting method as many 
of the methods show the same sequence. To find about 
sources of uncertainty in the stellar mass measurement, 
we need to understand the cause of such deviations. A 
large fraction of the deviant galaxies have intermediate 
to high extinctions ( E(B — V)> 0.3) indicating they are 
likely dusty starburst systems. The degeneracy between 
the SED fitting parameters for these galaxies is higher as 
their SEDs mimic both the dusty starbursts and quies¬ 
cent systems. 

We now explore the extent to which age and extinc¬ 
tion are responsible for the sequences seen in Figure 
5 and for uncertainties in stellar mass measurement. 
Using the input age and extinction values for simu¬ 
lated galaxies in TEST-1, we compare A log(M) with 
both Alog(age) and A(E B _y) (respectively defined as 
A log(age) = log(age est ) -log(age input ) and A {E B _ V ) = 
Er B _y\ est — E( B _y\i n put) for each method, with results 
presented in Figure 6. All the experiments show a strong 
correlation between deviations in the stellar mass and 
age. This indicates that galaxies with uncertain stel¬ 
lar mass estimates also have uncertain ages (ie. large 
A log(M) and A(age) values). In other words, the er¬ 
rors in the stellar mass and age for mock galaxies, when 
constraining other parameters (as in TEST-1), are cor¬ 
related. The observed divergence between the age esti¬ 
mates for younger galaxies (< 10 8 years) is partly due 
to the varying M/L ratios among these systems. The 
observed trend in Figure 6 is somewhat weaker on the 
A log(M) vs. A E{B — V ) plane, indicating that for the 
range in —0.5 < A E(B — V)< 0.5, there is a wide range 
in A log(M)- (—1 < A log(M) < 1), caused by differences 
in the estimated age values. By constraining galaxies 
only to those with ages > 10 8 years, the observed trend 
in Figure 6 in both extinction and stellar mass is reduced. 

Given the degeneracy between age and extinction and 
to understand the error budget in the stellar mass es¬ 
timates, we now disentangle contributions from these 
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Table 5 

Estimated Spearman rank coefficients (column 1) and Pearson correlation coefficients (column 2) for TEST-1, using M vn[yll i/ Me.-, as the 

reference mass 

Code ID 1 2 

I. A 0.91 0.92 

4.B 0.98 0.96 

6. C 0.98 0.98 

7. D 0.96 0.82 

8. E 0.96 0.96 

9. F 0.98 0.98 

10. G 0.98 0.98 

II. H 0.96 0.95 

12.1 0.96 0.96 

13.J 0.99 0.98 

Note. — 1 Spearman rank correlation coefficient 2 Pearson correlation coefficient 


parameters by dividing the sample into three different 
age and extinction intervals and estimating the rms in 
A log(M) values for each interval. The result is a covari¬ 
ance matrix representing the error budget where the rows 
and columns are the age and extinction respectively, with 
the matrix elements being the rms values in A log(M) ie. 
the stellar mass within a given age-extinction grid. As in 
Figure 5, the sample is divided into age and extinction 
intervals: 10 7 < age < 10 s ; 10 8 < age < 10 9 ; 10 9 < 
age < 10 10 years and Eb-v < 0.3; 0.3 < Eb~v < 0.6; 
Eb~v > 0.6. The error budget matrices correspond¬ 
ing to each of the methods are presented in Table 6. For 
any given method, the elements of the matrix correspond 
to the rms scatter in A log(M) for a given age and ex¬ 
tinction. Using these error budget matrices, we separate 
relative contributions due to method, age and extinction 
to observed uncertainties in the stellar mass. 

Overall, the methods agree well per age-extinction 
grid. Also, for any given method, the rms scatter in 
A log(M) (Table 6) increases for redder ( E(B — V) > 0.6) 
and older (age > 10 9 years) galaxies. The total error 
budget matrix (the overall uncertainty in A log(M) for 
different age and E(B — V)) from all methods combined, 

is estimated as ay,- = k> w h ere a ij,k are the 

matrix elements at any given age, i , and extinction, j , 
grid corresponding to the method, k. The total error 
budget matrix is also given in Table 6. 

The rms scatter in A log(M) from method l.A is likely 
dominated by photometric errors. Therefore, the error 
matrix associated with this method in Table 6 provides 
a lower limit to uncertainties in the stellar mass mea¬ 
surement (for any age/extinction combination) caused 
by photometric errors. 

In conclusion, we find that uncertainties in stellar mass 
measurement are coupled with those in age and extinc¬ 
tion, being more tightly coupled with errors in age. The 
same galaxies are outliers in both stellar mass and age 
regardless of the code used. We find serious degeneracy 
for galaxies with ages < 10 s years, with the rms scatter 
in stellar mass increasing for redder and older systems. 
Relative contributions due to age and extinction are dis¬ 
entangled by forming a covariance matrix. 

5. TEST-2: EFFECT OF FREE PARAMETERS ON 
STELLAR MASS MEASUREMENT 

The tests performed in the last section were used to 
quantify the deviation in the estimated mass of galaxies 
(from their expected values) due to different methods 
and to disentangle the effects of age and extinction in 


stellar mass measurement. Here, we explore the effect 
of free parameters (i.e. degeneracies in the SED fits) on 
the stellar mass estimates. First, we perform SED fits 
to the mock data, allowing all the parameters to be free 
(except for the IMF which is chosen to be Chabrier and 
the redshift, which is fixed to its input value)-(TEST- 
2A). Second, we fix all the parameters in the SED fits 
and repeat the analysis (TEST-2B). The participating 
teams estimated the stellar masses following the above 
prescriptions. By comparing results between TEST2-A 
and TEST-2B for each method, we eliminate the code¬ 
dependent effects. The difference then reveals the effect 
of free parameters on the stellar mass estimate. 

Figures 7a and 7b compare the input and estimated 
stellar mass values from different methods for TEST-2A 
and TEST-2B respectively. The rms scatter, bias and 
outlier fractions are estimated and presented in Table 
7. For some of the methods in TEST-2A, there is a 
clear bi-modality between the expected and estimated 
stellar mass values (eg. l.A, 4.B and 6.C). All the meth¬ 
ods underpredict the stellar masses at M < 10 8 M 0 , 
with the rms values changing among the methods from 
0.172 dex to 0.394 dex. Also, some of the methods show 
a systematic offset in the estimated stellar mass from 
their “true” values. In Figure 7a, we also examine 
the distribution of galaxies as a function of extinction, 
measured for individual galaxies- E( B -v ) = 0 (green); 
0 < Eb-v < 0.3 (blue); 0.3 < Eb-v < 0.6 (black); 
Eb-v > 0.6 (red). There are two clear sequence of galax¬ 
ies on the mass comparison plots in Figure 7a (TEST2- 
A), separated depending on their extinction values. The 
sequence is particularly evident for nrrthods l.A, 4.B, 6.C 
and 10.G. For l.A, there is a clear separation of galax¬ 
ies depending on their extinction, with redder galaxies 
( Eb-v > 0.3) having a smaller (estimated) mass. Simi¬ 
lar effects are found for experiments 4.B and 6.C where 
there is a complete absence of sources with high extinc¬ 
tion ( Eb-v > 0.6). Also, sources with medium extinc¬ 
tion (0.3 < Eb-v < 0.6) are mostly associated with 
galaxies with higher stellar masses. This indicates a pos¬ 
sible interplay between stellar mass and extinction when 
both parameters are estimated simultaneously through 
the SED fits. 

The observed bi-modality disappears in TEST-2B 
(Figure 7b) when the free parameters are fixed. How¬ 
ever, there is a mass-dependent effect in TEST-2B where 
most of the methods underestimate the stellar mass for 
low (M < IC^AIq) and high (M > 3 x 10 9 Mq) mass sys¬ 
tems. TEST-2B confirms that the observed bi-modality 
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Figure 5. The deviations in the stellar mass estimates from their input values (A log(M) = log(Mi npu t) — log(M es t)) from TEST-1 are 
plotted against the input age and extinction. Left panels: objects are divided into three different extinction intervals- 0 < Eb—v < 0.3 
(blue), 0.3 < Eb—v < 0.6 (black); Eb—v > 0.6 (red). Right panels: objects are divided into three different age intervals- 7 < log(age) < 8 
(blue); 8 < log (age) < 9 (black); 9 < log (age) < 10 (red). This separates the contributions due to age, extinction and code/method to 
errors in the stellar mass estimates. 


detected in TEST-2A is likely caused by the interplay 
between the free parameters. The rms in A log(M) val¬ 
ues between the two tests are comparable, with TEST- 
213 having slightly higher rms (Table 7). Both l.A and 
4.B have higher rms values. They use templates gener¬ 
ated from CB07 population synthesis models (for TEST- 
2A), which is different from the BC03 model templates 
used to generate the mock catalog. Furthermore, l.A and 
4.B use constant and hybrid star formation histories (for 


TEST-2A) respectively, which is different from the expo¬ 
nentially declining model assumed for the majority of the 
methods here. Once the population synthesis model used 
to generate the template SEDs are adopted consistently 
with those for the mock data (BC03), as in TEST-2B, 
the observed bi-modalities disappear (Figure 7b- also see 
section 6.2). However, for almost all the methods there is 
a relatively higher bias in TEST-2B compared to TEST- 
2A. 
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Figure 6. Differences between the estimated and input stellar mass values ( Alog(M )) are compared with deviations in age (A log(age)) 
and extinction (A E(B — V)) from TEST-1. The trend between the mass and age residuals indicates that galaxies which have uncertain 
mass estimates also have uncertain ages. The residuals in age range from — ldex to ldex while for the extinction they span the range 
—0.5dex to 0.5 dex. 


Table 6 

Error budget matrices for the methods in Table 3, when applied to TEST-1. The “total” error budget matrix represents the uncertainties 
for any given age-extinction grid regardless of the code/method. The uncertainties in stellar mass estimates due to photometric errors 

correspond to the error budget matrix associated with method l.A 


Method 

l.A 

4.B 

6.C 

8.E 

10. G 

11. H 

12.1 

13.J 


Eb-v 

< 0.3 

0.3 - 0.6 

> 0.6 

log(Age) 

7.5 

0.074 

0.100 

0.251 

8.5 

0.076 

0.169 

0.250 

9.5 

0.167 

0.267 

0.165 

7.5 

0.112 

0.155 

0.242 

8.5 

0.128 

0.168 

0.310 

9.5 

0.232 

0.242 

0.394 

7.5 

0.109 

0.120 

0.265 

8.5 

0.072 

0.168 

0.321 

9.5 

0.221 

0.229 

0.164 

7.5 

0.176 

0.197 

0.344 

8.5 

0.156 

0.220 

0.374 

9.5 

0.322 

0.157 

0.346 

7.5 

0.116 

0.122 

0.230 

8.5 

0.109 

0.173 

0.314 

9.5 

0.292 

0.234 

0.504 

7.5 

0.130 

0.158 

0.501 

8.5 

0.136 

0.236 

0.313 

9.5 

0.385 

0.452 

0.585 

7.5 

0.138 

0.144 

0.250 

8.5 

0.115 

0.147 

0.237 

9.5 

0.278 

0.246 

0.440 

7.5 

0.087 

0.104 

0.275 

8.5 

0.073 

0.133 

0.228 

9.5 

0.223 

0.244 

0.281 

7.5 

0.121 

0.141 

0.307 

8.5 

0.112 

0.180 

0.297 

9.5 

0.273 

0.271 

0.387 


Total Error Budget 



































A Critical Study of the Stellar Mass of Galaxies 


17 


There is a significant offset in the result for the ex¬ 
periment 10.G in TEST-2A, corresponding to a bias of 
0.183 dex. This method uses templates generated from 
IMarastonl ( 2005 1 with a hybrid SFH (consisting of expo¬ 
nentially declining, constant at 0.1, 0.3 and 1 Gyrs and 
zero afterwards)-(Figure 7a). The offset is completely re¬ 
moved in TEST-2B where BC03 was adopted. The ob¬ 
served offset in 10.G shows the sensitivity of the results to 
template SEDs generated from the two population syn¬ 
thesis co des ( BC03 vs. M05). The templates resulting 
from IMarastonl ( 2005 ) include contributions from pulsat¬ 
ing Asymptotic Giant Branch (AGB) stars, making them 
different from the templates based on the BC03 code, 
which include less contribution from these stars. This 
leads to an underestimation of the stellar mass of galax¬ 
ies when including the AGB contribution in the SEDs. 
The scatter in method ll.H and 13.J, based on TEST- 
2A, are small with no offsets observed. These methods 
both use a SFH and synthetic population models similar 
to those adopted in TEST-2A. It is clear from Figures 
7a and 7b that using the median of all measured stellar 
masses, gives smaller rms errors when compared to the 
expected stellar mass. However, we note that the median 
stellar mass measured for TEST-2A is not meaningful 
since the masses from this test are based on different 
input parameters (i.e. population synthesis models). 

For each method, we estimate the difference in quadra¬ 
ture between the rm s values for TEST-2A and TEST-2B 
(rms[2A — 2 B] = \/o\ A — cr| B ) and present it in Table 
7. This gives the contribution to the error budget in 
the stellar mass due to degeneracy in the SED fits and 
changes from 0.037dex (for 6.C) to 0.264dex (for 9.F). 
In Figure 8 we compare results between different meth¬ 
ods, expressed by their rms and bias (in stellar mass) 
estimates, as listed in Table 7. The smallest rms value 
is associated with methods ll.H, 12.1 and 13.J as well 
as the smallest outlier fractions. Method 13.J also has 
the least bias, indicating that this method provides the 
closest mass estimates to the “real” values. 

The simulations in TEST-2A are the most realistic. 
Therefore, it is instructive to further investigate the main 
sources of scatter in A log(AI) values based on this test. 
In Figure 9 we show A log(M) distributions as measured 
from TEST-2A, plotted in H-band ( F160W) magnitude 
intervals for each method separately. It is clear that for 
any given method, there is an increase in the width of 
the distributions from bright to faint magnitudes, indi¬ 
cating the effect of photometric S/N ratios on the stellar 
mass measurement. For some methods, there is an offset 
from A log{M) = 0, likely caused by systematic effects 
in stellar mass measurement. There are also differences 
in the distributions among different methods even over 
the same luminosity range. Figure 9 shows the median 
A log(M) has a narrow distribution at all luminosities, 
and is strongly peaked at Alog(M) ~ 0. This indicates 
that the median of stellar masses for each galaxy, mea¬ 
sured from all the methods in Table 3, successfully re¬ 
produces the input stellar mass. However, although this 
is the closest simulation to real data, the results here 
should be interpreted with caution as the simulations in 
TEST-2A are based on “free” input parameters in the 
fit (i.e. the SFH, population synthesis templates, metal- 
licities, age and extinction were not fixed), the effect of 
which could be reflected on the median stellar mass (ie. 


the input parameters are not the same among different 
methods, which could affect the estimated median val¬ 
ues). Considering other independent results where the 
majority of the input parameters are fixed, as listed in 
Table 4 (and 2nd line in Table 7), one could assert that 
the median of the independently estimated stellar masses 
gives the closest agreement with the expected (input) 
mass. 

Figures 10a and 10b present the relation between 
A log(M) (from TEST-2A) and photometric S/N ratios 
and redshifts respectively. For most of the methods, an 
offset is present in A log(M) for high S/N ratios, indi¬ 
cating that the errors in stellar masses are not neces¬ 
sarily caused by photometric uncertainties. There is an 
increase in the scatter at lower S/N values (i.e fainter 
galaxies). Furthermore, we find a clear trend in A log(M) 
as a function of redshift (Figure 10b), with some meth¬ 
ods showing significantly larger scatter in A log(M) at a 
given redshift. At higher redshifts, all the methods over¬ 
estimate the stellar masses while the same methods un¬ 
derestimate the stellar mass for lower redshift galaxies. 
This is similar to result from Figure 7, where the stel¬ 
lar masses were underestimated at M < 10 8 Mq. The 
observed trend in Figure 10b is likely caused by a va¬ 
riety of different reasons. This is likely due to changes 
in the functional forms assumed for SFHs at different 
redshifts and the diversity of this parameter within the 
SAMs. For example, at high-z almost all galaxies have 
rising SFHs while at low-z there is a mix of quenched 
and star-forming galaxies. Furthermore, changes in ex¬ 
tinction among galaxies, lower photometric S/N ratios 
for some or re-cycling and mass loss could contribute to 
the observed trend. 

The simulated templates based on the SAMs are gener¬ 
ated from a diversity of SFHs (declining, increasing and 
constant) while the methods use simple prescriptions for 
the SFHs, causing an inconsistency in the mass estima¬ 
tion process. To explore if extinction is responsible for 
the observed trend and bimodality, we identify galaxies 
in Figure 10b by their input E(B-V) values. For method 
l.A, high extinction ( E(B — V) > 0.6) appears to be re¬ 
sponsible for some of the observed bimodality but this is 
not the case for other methods. Methods that show bi- 
nrodality in Figure 10b (l.A, 4.B and 10.G) use different 
population synthesis models (CB07 and M05) than the 
one used in the SAMs (BC03) from which the mock cata¬ 
log is constructed. This introduces bias or additional er¬ 
rors in the mass estimate and hence, is responsible for the 
observed bimodality and the trend with redshift. This is 
particularly the case as the difference in the stellar mass 
estimates due to differences in the population synthesis 
codes (CB07, M05 and BC03) is dependent on redshift 
(see section 6.2). However, method 6.C shows serious bi¬ 
modality while using the same stellar synthesis model as 
the SAMs. Furthermore, since there is a change in the 
photometric S/N ratios with redshift, it is probable that 
photometric uncertainties is partly responsible for the 
observed trend in Fig 10b.This is explored by restricting 
the sample in TEST-2A to galaxies with S/N > 10. This 
does not remove the observed bimodality or the trend in 
the A (log(M)) — z relation, indicating that photometric 
errors are not responsible for the observed distribution 
of galaxies. 

The observed filters refer to different rest-frame wave- 
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Figure 7. (a)-Top: Comparison between the input and estimated stellar masses for TEST-2A. The colors correspond to extinction 
associated with each galaxy, as estimated from the SED fits- E^b—V) = 0 (green); 0 < Eb—v < 0.3 (blue); 0.3 < Eb—v < 0.6 (black); 
Eb—v > 0.6 (red). For the methods 10.G and 12.1 no E(B-V) values are available. There is a clear bi-modality in some cases. The red line 
corresponds to slope 1. Most methods underestimate the stellar masses for galaxies with M < 10 8 Mq. (b)-Bottom: Comparison between 
the expected and estimated stellar masses for TEST-2B. The observed bi-modality in TEST-2A largely disappears when parameters are 
constrained. This test is designed to study the effects of free parameters on the estimated stellar mass by leaving all the parameters free 
(TEST-2A) and by constraining them (TEST-2B). 
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Table 7 

The rms scatter, bias and outlier fraction (OLF) in A log(M) for TEST-2A (first line) and TEST-2B (second line) 
gives the difference (in quadrature) between a values for TEST-2A and TEST-2B, defined as rms[2A — 2 B] = 
quantifies contribution from free parameters to uncertainties in the stellar mass 


Code 

rms 

rms 

rms 

bias 

outlier 



no outliers 

[2A-2B] 


fraction 

l.A 

0.328 

0.234 

0.191 

0.087 

0.164 


0.267 

0.201 


0.096 

0.056 

4.B 

0.394 

0.235 

0.085 

0.157 

0.157 


0.403 

0.314 


0.275 

0.161 

6.C 

0.228 

0.166 

0.037 

0.030 

0.057 


0.225 

0.177 


0.098 

0.038 

7.D 

0.343 

0.245 

0.133 

0.065 

0.133 


0.368 

0.224 


0.005 

0.153 

8.E 

0.230 

0.194 

0.165 

0.005 

0.038 


0.283 

0.223 


-0.131 

0.079 

9.F 

0.219 

0.189 

0.264 

0.012 

0.029 


0.343 

0.220 


0.128 

0.132 

10.G 

0.311 

0.261 

0.215 

0.183 

0.096 


0.225 

0.170 


-0.009 

0.045 

11.H 

0.202 

0.192 

0.167 

0.132 

0.014 


0.279 

0.200 


0.119 

0.082 

12.1 

0.203 

0.186 

0.161 

0.066 

0.020 


0.259 

0.222 


0.152 

0.053 

13.J 

0.172 

0.163 

0.129 

0.026 

0.009 


0.215 

0.187 


0.095 

0.030 

Median 

0.175 

0.168 

0.110 

0.069 

0.008 


0.203 

0.174 


0.068 

0.028 




Figure 8. The rms (bottom) and bias (top) in stellar masses measured from different codes are compared for TEST-2 A. In the lower 
panel, filled circles represent rms estimates based on all the data while crosses are rms values with outliers excluded. 
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Figure 9. Histogram of Alog(M /Mi npu t) values (from TEST-2A) in H-band magnitude intervals, estimated for each method separately. 
The rms values corresponding to each distribution are also shown. There is a clear increase in the width of the histogram towards fainter 
magnitudes. Also, there is a difference between the methods in terms of the spread in A log(M /Mi npu t). The median of the stellar masses 
provides narrow distribtions indicating that the median of different independent methods is a stable measure of the stellar mass. 


lengths and different redshift intervals. Therefore, the 
observed redshift dependence could be due to the fact 
that more of the light from shorter wavelengths (i.e. 
UV/optical light sensitive to SFR, reddening and age) 
is contributing to the observed light from high-z galax¬ 
ies while, the longer wavelengths (i.e. optical/infrared 
light sensitive to stellar mass) are dominating the light 
for low-z galaxies. This inherently introduces a redshift- 
dependent bias by weig hting the fit towards different 
galaxy types. iPforr et all (120121 ) showed that high-z 
galaxies are easier to fit because the parameter space 
for degeneracies (specially age and dust) is more limited 
due to the small age of the Universe at those redshifts. 
Using rest-frame U — V colors, we divided the mock cat¬ 


alog into the red and blue galaxies and measured their 
respective stellar masses in redshift intervals. No signif¬ 
icant difference was found between A (M) values from 
these two populations. 

It is also important for the observed SEDs to cover the 
spectral breaks at any given redshift, as these breaks are 
essential for estimating physical parameters of galaxies. 
To quantify this, we identified the redshift interval where 
a certain break moves in or out of the observed wave¬ 
length range. We then measured and compared the me¬ 
dian A (M) values for the two sets, separating galaxies in 
redshift bins to those with/without the spectral features 
lying in that bin. If the observed redshift-dependence 
was due to this effect, we would expect to see a difference 
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Figure 10. (a)-Top Panel: A log(M) from TEST-2A as a function of the photometric S/N ratios. (b)-Bottom Panel: A (log((M)) from 
TEST-2A as a function of redshift. Colors correspond to different extinction values: = 0 (green); 0 < Eg—v < 0-3 (blue); 

0.3 < Eg_y < 0.6 (black); Eg_y > 0.6 (red). 


between the median A (M) values in redshift intervals. 
We find an average difference of only 0.03 dex between 
the (A log(M)) values from the two samples, too small 
to be responsible for the observed trend by itself. 

The effects of free parameters and in particular the 
population synthesis models are examined by studying 
the same relations using the data in TEST-2B, where 
all the teams used templates from BC03 (similar to the 
ones from which the mock catalogs are generated), zero 
extinction was assumed and the free parameters were all 


fixed (Table 1). The results are presented in Figures 11a 
and lib. The bi-modality observed for TEST-2A disap¬ 
pears however, the trend with redshift is still present. 

As mentioned above, a possible cause of the observed 
trend in Figures 10b and lib is different treatment of 
recycling and mass loss in the SAMs compared to the 
fitted models. The mock catalog here is generated using 
SAMs, which predict the multi-band photometry based 
on the BC03 model in the same way as the SED fitting 
codes, but predict stellar mass using the instantaneous 
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recycling approximation, which does not accurately take 
into account the stellar mass loss as a function of time. 
In this scenario, the stellar mass is underpredicted at an 
early epoch after the stellar mass is formed, and over¬ 
predicted at a later epoch when the real stellar mass loss 
exceeds the adopted return fraction. We estimate that 
the change in the stellar mass due to instantaneous recy¬ 
cling is around 0.04 dex in A log(M), with a clear trend 
with redsliift. The expected trend due to recycling and 
mass loss is shown in Figure lib (green boxes), indi¬ 
cating that it only plays a minor role in explaining the 
observed trend. The conclusion is that although none of 
the effects, described above, could individually explain 
the observed trends in Figs 10b and lib, the combined 
contribution from the individual effects, could fully ex¬ 
plain it. 

In Figure 12 we compare the rms, bias and outlier frac¬ 
tions in Alog(M) between TEST-2A and TEST-2B. The 
rms values, even after removing the outliers, are still high 
(~ 0.2 dex). The green point in Fig 12 corresponds to the 
median mass. The observed scatter in the bias and out¬ 
lier fractions between the two tests are identical, with 
methods that have higher rms scatter in TEST-2A also 
have high values in TEST-2B. 

To explore the effect of photometric errors, for each 
galaxy in TEST-2A simulation we estimate the rms in 
A log(M) between the values measured from different 
methods and plot it aganist the photometric S/N ratios 
in Figure 13. The scatter at any given S/N indicates 
the rms in A log(M) among different methods. As ex¬ 
pected, there is significant scatter at lower S/N ratios, 
with that decreasing towards higher values. The me¬ 
dian rms in S/N intervals are also shown in Figure 13. 
For TEST-2A, the rms distribution asymptotes around 
rms ~ 0.2 dex at S/N > 20. At these high S/N values, 
the effect of photometric uncertainties on stellar mass 
measurement is negligible and all the scatter is due to 
systematic and code-dependent effects. From TEST-1 we 
estimated that the contribution to the total rms due to 
method/code is 0.136 dex. Subtracting this, in quadra¬ 
ture, from the total rms for TEST-2 A gives rms = 0.146 
dex, which is the rms scatter in stellar mass estimate, 
due to the effect of the free parameters. We carried out 
a linear fit to the median values in Figure 13 and find: 
rms = (—0.013 ±0.023)S'/A+(0.409±0.230). Using this 
fit, one could estimate the rms values in the stellar mass 
for any given photometric S/N ratio. 

In conclusion, using realistic simulations from TEST- 
HA, we find the difference between the input (expected) 
and estimated masses (A(M)) to follow a distribution 
that broadens from bright to faint magnitudes. At a given 
magnitude interval, while some methods show a relatively 
larger scatter in A (M), some show a systematic offset 
from the A (M) = 0 line. The observed offset in stel¬ 
lar mass is likely due to degenaracy between the free pa¬ 
rameters (i.e. age and extinction). The offset is signif¬ 
icantly reduced in TEST-2B where the input parameters 
are fixed. A trend was found between A (M) and redshift 
for both TEST-2A and TEST-2B. The most likely cause 
is the diversity of the SFHs used in the SAMs, from which 
the mock catalogs were constructed (and the fact that the 
methods mostly use simplified SFHs). 


6. TEST-3: UNCERTAINTIES IN STELLAR MASS 
MEASUREMENT FROM OBSERVED DATA 

6.1. Internal Tests of Stellar Mass Measurement 

Methods 

In this section we study the internal consistency in stel¬ 
lar mass estimates between different methods, using ob¬ 
served data (TEST-3). Unlike the mock catalogs, in case 
of the observational data we do not have prior knowledge 
of the expected stellar masses and therefore, this only 
provides an internal test of the consistency of mass mea¬ 
surements. Given this, we need to define a “reference” 
mass as a base to compare all the other masses with. 
Since the median mass is shown to be relatively unbiased 
(eg. Figure 9), we adopt that as the “reference” mass. 
We note that the median of the stellar masses based on 
methods with different input parameters (TEST-3A) is 
not meaningful. Also, any bias in individual mass esti¬ 
mates would be reflected on to the median. However, 
this is only aimed to provide a relative test between dif¬ 
ferent methods and the choice of the “reference” mass 
will not affect the results in this section. Furthermore, 
while a mass estimated from any other method here is 
equally acceptable as the “reference”, it would still be 
susceptible to the above problems. 

In Figure 14 we compare stellar masses predicted from 
different methods using TEST-3A and TEST-3B with 
the median mass, M me d/ Mq, for each method. The rms , 
bias and outlier fractions in A log(M me d) = log(M est ) — 
log(M me d ) is estimated and listed in Table 8. These 
should only be considered as relative measures, provid¬ 
ing estimates of the overall agreement between masses 
from different methods and for individual methods be¬ 
tween TEST-3A and TEST-3B. In case of TEST-3A, 
some methods show large scatter (eg. l.A and 4.B) 
and large outlier fraction (8.E) while others closely agree 
(11.H and 12.1). The behavior of these methods is con¬ 
sistent with results from the mock catalogs (TEST-2A). 
Also, the scatter between the estimated stellar masses 
from individual methods and the “reference” values are 
significantly reduced for most models when using TEST- 
36 (Figure 14 and Table 8). The sum of the rms values 
(in quadrature) in Table 8 gives the dispersion between 
different methods, corresponding to 0.39 dex (for TEST- 
3A) and 0.22 dex (for TEST-3B). The scatter in TEST- 
3A constitutes all the observational errors including pho¬ 
tometric uncertainties and errors in the SED fitting pro¬ 
cess and hence, provides an estimate of the observed error 
associated with mass measurement from any given tech¬ 
nique. The reduction in the rms scatter in TEST-3B is a 
result of the absence of constraints on the free parameters 
and the way different methods handle the SED fits. 

Figure 15 examines the consistency of the stellar mass 
and extinction estimates between different methods. For 
any pair of methods, we find the difference between their 
estimated stellar mass (A (M)) and extinction (A (ext)) 
values. Since these parameters are estimated simultane¬ 
ously from the SED fits, this provides a direct and un¬ 
biased test of the consistency of the stellar mass and ex¬ 
tinction estimates between different methods. It is clear 
that, in all cases, there is a shift on the Alog(M) — A(ext) 
plane from A log(M) = A (ext) = 0 point for any of the 
two methods compared. Some methods agree on their 
estimated stellar mass and some on the extinction but 
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Figure 11. (a)-Top Panel: A(log(M) from TEST-2B as a function of the photometric S/N ratios. (b)-Bottom Panel: A log(M) from 
TEST-2B is plotted as a function of redshift for TEST-2B. The blue boxes indicate the mean A log(M) values in redshift bins based on 
TEST2-A (Fig 10b). This shows changes in A log(M) per redshift interval between TEST-2A and TEST-2B. The green boxes show the 
expected trend due to re-cycling and mass loss (see the text for details). The green line connects the green boxes. 


none of the pair of methods agree in both. 

6.2. Dependence of the Stellar Mass on 
Population Synthesis Models 

The template SEDs generated by population synthesis 
models are the most fundamental components in mea¬ 
suring stellar mass of galaxies. It is therefore instructive 
to quantify the effect of the population synthesis models 
on the estimated mass of galaxies, given differences in 
the composition and data libraries used in these models. 


Here we estimate stellar masses using templates gener¬ 
ated from BC03 and CB07 models while keeping all the 
rest of the parameters the same. The main difference 
between these two models is the addition of pulsating 
Asymptotic Giant Branch stars to the CB07 model. For 
this experiment we use observational data from TEST-3, 
consisting of a sample of 586 galaxies. All the galax¬ 
ies in this sample have spectroscopic data, used to fix 
redshifts of the galaxies when performing the SED fits. 
Since this is an internal comparison between results from 
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rms (unconstrained) 


Bias (unconstrained) 


Figure 12. Comparison between the rms, bias and outlier fractions for TEST-2 A (horizontal) and TEST-2B (vertical). The green point 
corresponds to the median mass from all measurements. 




Figure 13. Left panel- The photometric S/N ratios are plotted vs. the rms in A log(M) for each galaxy, measured from different methods. 
Right Panel- the same as the left panel but plotted over a limited range of S/N values. The boxes are the median A log(M) values measured 
in S/N intervals. The line is the least squares fit to the median points. The equation of the line can be used to estimate the rms values in 
stellar mass as a function of the S/N ratios, the S/N ratios correspond to the H-band ( F160W) photometry. 


Table 8 

The rms scatter, bias and outlier fraction (OLF) in A log(M rne d) = log(Mest) — log(M rne d) for different methods applied to TEST-3A 
(first line) and TEST-3B (second line). M me d is the median of the stellar masses for a given galaxy, measured by different methods. 


Code 

rms 

rms 

outliers removed 

bias 

outlier 

fraction 

l.A 

0.327 

0.256 

0.111 

0.108 


0.093 

0.080 

0.046 

0.005 

4.B 

0.378 

0.203 

0.100 

0.074 


0.247 

0.230 

0.171 

0.028 

6.C 

0.223 

0.206 

-0.040 

0.014 


0.167 

0.164 

-0.130 

0.003 

7.D 

0.294 

0.149 

-0.007 

0.055 


0.268 

0.114 

0.043 

0.044 

8.E 

0.938 

0.297 

-0.237 

0.345 


0.365 

0.242 

-0.194 

0.196 

10.G 

0.235 

0.216 

0.147 

0.014 


0.143 

0.125 

-0.101 

0.003 

11.H 

0.132 

0.106 

-0.003 

0.014 


0.225 

0.184 

0.137 

0.029 

12.1 

0.114 

0.098 

0.030 

0.003 


0.183 

0.175 

0.128 

0.010 

13.J 

0.146 

0.146 

-0.110 

0.00 


0.102 

0.102 

-0.012 

0.00 
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Figure 14. Relations between the median of stellar masses between all the methods and the estimated stellar mass from individual 
methods for both TEST-3A (left panels) and TEST-3B (right panels). There is a significant reduction in the scatter and outlier fraction 
in the case of TEST-3B. 
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Figure 15. Compares the difference in the stellar mass and extinction- A log(M) and A (ext) respectively- between any two methods 
using TEST-3A applied on 586 observed galaxies in GOODS-S. The scatter in these diagrams around the center (A log(M) = A (ext) = 0) 
indicates that stellar mass measurement methods cannot at the same time produce both the stellar mass and extinction. 
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the two models, there is no dependence on the “true” 
stellar mass values. The difference between the stellar 
masses using templates from BC03 and CB07 is plotted 
aganist redshift and mass in Figure 16, showing an offset 
of ~ 0.2dex in log(MBC 03 /McB 07 ), with higher masses 
from the BC03. The difference between the stellar masses 
reduces at higher redshifts (z > 3) while it is constant 
over the entire stellar mass range studied here. 

Given that the observational sample here is confined to 
brighter galaxies (for which spectroscopic data are avail¬ 
able), it is possible that the above result is biased. To 
examine this, we apply the same procedure on the sim¬ 
ulated data in TEST-2A (which is the most realistic). 
We find a similar shift ~ 0.2 dex in log(MBC 03 /McB 07 ) 
as for the observational data. The simulated galaxies 
also show closer agreement between the stellar masses 
at higher redshifts, in agreement with the result from 
TEST-3. The observed offset here is mainly due to the 
addition of the pulsating AGB stars to the CB07 model, 
affecting near-infrared part of the SEDs generated from 
it. At higher redshifts (z ~ 3) the near-infrared light 
shifts outside the wavelength range spanned by the SEDs 
here and hence, the results (stellar masses) become un¬ 
affected by the AGB contribution, leading to a better 
agreement in the estimated stellar mass between BC03 
and CB07. 

6.3. Uncertainties in Stellar Mass Measurement 
due to Contributions from Nebular Emission 

It is well known that contribution from nebular emis¬ 
sion lines is a non-negligible component of the observed 
flux from a galaxy at certain redshifts, leading to appre¬ 
ciable differences in the parameters estimated from their 
SEDs (|de Barros et al.ll20l4 iSchenker et all 1201 .'bill . In 
the absence of correction for nebular emission, one over¬ 
estimates both the stellar mass and age, as the nebular 
emission mimics an increase in the observed flux at longer 
wavelengths, enhancement of Balmer Breaks and hence, 
increased mass and age. 

However, it is difficult to accurately quantify the ef¬ 
fect of nebular emission in the estimated stellar mass in 
galaxies, as it depends on the redshift of the galaxy, the 
filters used for the SED fitting process and the width of 
the filters. For example, the Ha line shifts into the IRAC 
3.6 /im band at z ~ 3.1. Depending on the width of the 
filter, we get different fractional contributions to the ob¬ 
served fluxes. Therefore, the contribution due to nebular 
emission lines needs to be taken into account depending 
on the redshift of the galaxy in question and the filters 
used. 

To quantify this, we estimated the stellar masses with 
and without correction for nebular emission lines, using 
the observed SEDs in TEST-3, keeping all the rest of the 
parameters fixed. We find a difference of up to ~ 0.3 dex 
in the estimated stellar mass, purely due to contribution 
from nebular emission lines. 

In conclusion, differences in the stellar mass and ex¬ 
tinction between differet methods when using observa¬ 
tional data, confirm that the majority of the methods 
do not converge on the estimates of BOTH the stellar 
mass and extinction. Dependence of the stellar mass on 
population synthesis models was investigated and found 
that inclusion of pulsating AGB stars would decrease the 
estimate of the stellar mass by 0.2dex. Finally, it was 


found that the contribution from nebular emission lines 
is to increase the stellar mass of galaxies by ~ 0.3 dex, 
depending on the redshift of the galaxy in question. 

7. TEST-4: THE EFFECT OF NEAR-INFRARED 
PHOTOMETRIC DEPTH AND SELECTION 
WAVELENGTH ON THE OVERALL STELLAR 
MASS 

To investigate the effect of near-infrared photometric 
depth on the estimated stellar mass, we designed TEST- 
4 which is similar to TEST-3A with the only differ¬ 
ence being that it is based on a z-band selected sam¬ 
ple (as compared to TEST-3A which was based on an 
H-band (F160W) selected sample) and with shallower 
near-infrared (JHK) data. Since this test also depends 
on the real data, we do not know the expected stellar 
masses. Using the median of the stellar masses mea¬ 
sured for each galaxy by different methods, we estimate 
A log(M med ) = log(M est ) - log(M Me dian) for individual 
galaxies. The rms in A log{M me( f) is then calculated for 
each method using all the galaxies, and for each galaxy 
using measurements from different methods. The re¬ 
sults from TEST-3A and TEST-4 are compared in Fig¬ 
ure 17, which also presents comparison between the rms 
values when outliers are removed and between the bias 
estimates. There is no significant difference in the av¬ 
erage mass estimates between the optical (z-band) and 
near-IR (H-band) selected samples. Also, no difference is 
found due to a relatively shallower near-IR photometry 
in TEST-4. Figure 18 presents the median rms values in 
S/N bins for TEST-3 A and TEST-4. There is no signifi¬ 
cant difference between the masses estimated from these 
two methods in terms of S/N ratios, both converging to 
rms=0.2 at S/N > 40. 

In conclusion, no significant difference is found in the 
estimated stellar masses due to the selection wavelength 
of the survey or the depth of the near-IR data alone. 

8. COMPARISON WITH OTHER STUDIES 

In recent years several studies have addressed the 
dependence of the stellar mass on physical parame- 
ters using simulated catalogs with k nown input value s 
dWuvts et al.1 120091 Lee et al.l 120091 : iPforr et all l2012jh 
iLonghetti fe SaraccoT i 200911 studied the dependence of 
the estimated stellar masses on age, metallicity, IMF and 
SFH for early-type galaxies at 1 < 2 < 2, using different 
stellar population synthesis codes to model their SEDs. 
They found that, at a given IMF, the stellar masses can¬ 
not be recovered better than a factor of 2 — 3. 

Using model templates based on BC03, assuming 
Calzetti extinction law with reddening in the range 0 — 4, 
and three SF Hs: SSP , cons t ant SFR and a r model with 
r = 0.3 Gyr, IWuvts et al.1 (|2009jl generated mock cata¬ 
logs in the redshift range 1.5 < z < 3. When keeping 
redshift fixed, they underestimated the reddening, stel¬ 
lar mass and SFRs however, these estimates improved 
when redshift was used as a free parameter in the fit. 
While correctly predicting properties of spheroidal galax¬ 
ies, they failed to reproduce input parameters for star¬ 
forming systems. Their results agree well with the inde¬ 
pendent study bv lPforr et al.1 ( 2012 ). 

Concentrating only on a simulated sample of Lyman 
Break Ga laxies at z ~ 3.4, 4 and 5, and using BC03, 
iLee et all (|2012f) found that both masses and SFRs are 
underestimated while the ages are overestimated. They 
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Figure 16. Comparison between the stellar mass estimates based on the BC03 and CB07 population synthesis models using observational 
data from TEST-3. All the parameters in the SED fits are fixed, with the only difference being the population synthesis models which 
generate the template SEDs. 


attributed this to differences in the SFHs between the 
mock and r-model templates used in the fitting pro¬ 
cess. They further showed that data spanning over a 
long wavelength range is essential to best recover the in¬ 
put paramete rs. 

iPforr et all (120121) performed a comprehensive study 
of uncertainties in the estimated physical parame- 
ters in galaxies . Bas ed on the SED fitting code of 
iBolzonella et al . (l2000f) and population sythesis models 
from iMarastonl (|2005f ). they found that the most impor¬ 
tant parameter in r ecovering the ste l lar m ass is the SFH, 
in agreement with iMaraston et alJ (120101) . This under¬ 
lines the importance of the physics of the model tem¬ 
plates used in the SED fitting process. Using mock pas¬ 
sive and star-forming galaxies in redshift range 0.5 < z < 
3, they examined the sensitivity of the stellar mass to red- 
shift. When spectroscopic redshifts are known, they find 
best stellar mass estimates at low redshift when redden¬ 
ing is excluded and at high redshift using reddening and 


inverted tau models (Pforr et al. 2012). The inclusion of 
reddening at low redshift causes severe underestimation 
for the stellar masses. When redshift is a free parameter 
in the fit (e.g. when no spec-z are available), the addi¬ 
tional degree of freedom allows for better mass estimates 
because redshift compensates for SFH and metallicity 
mismatch as well as the age-dust degeneracy (Pforr et 
al. 2013). This agrees well with results from the current 
study. At low redshifts, masses are still best determined 
excluding reddening from the fit. 

In this paper we performed a critical study to quantify 
differences between the stellar masses estimated using 
different methods with model templates from different 
population synthesis codes, considering the existing de- 
genaracies between the physical parameters. By fixing 
the physical parameters (specifically redshifts), we find a 
larger difference between the predicted and expected stel¬ 
lar masses. For example, by allowing the redshift to vary 
in the fit, it compensates for the mismatch between SFH 
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Figure 17. Compares the rms values in Alog(Mjued) = log(M es t) — log(MMed), measured from different codes, between TEST-3A and 
TEST-4. M mK ,i is the median of the mass estimates for individual galaxies from different codes. Also, presented are the comparison 
between the rms values with the outliers removed and the bias resulted from different methods. 



S/N 


Figure 18. Median of the rms values in A log(M me< i) = log(M e3 t) — log(M Mfd) estimated in S/N intervals. M me d is the median of the 
mass estimates for individual galaxies from different codes. The plot shows results for both TEST-3A and TEST-4. The lines are the best 
fits to the median values. 


and metallicity and age-dust degeneracy and hence, im¬ 
proves the recovery. In agreement with previous studies, 
we find that our lack of knowledge of the correct SFH, 
combined with inherent degeneracy between age, dust 
and metallicity, are the main reasons for uncertainties in 
stellar masses. Moreover, the estimated uncertainty de¬ 
pends on the wavelength coverage at any given redshift. 
We also investigated the effect of photometric uncertain¬ 
ties on these parameters and confirm that their effect is 
less serious than the above parameters. 

9. THE ERROR BUDGET 

In this section we quantify and compare relative con¬ 
tributions from the main sources dominating uncertain¬ 
ties in the stellar mass measurement. The situation 
becomes complicated by the fact that these parame¬ 
ters are correlated. Therefore, one needs to disentangle 
their individual contributions, as investigated by simula¬ 


tions in previous sections. In its general term, the un¬ 
certainty is defined as the rms scatter in A log(M) = 
log(Mi nput ) — log(M est ). The uncertainties in the stellar 
mass due to different parameters are listed in Table 9 
and explained below: 

Photometric errors: We examined this by estimating 
stellar masses for galaxies in mock catalogs (with known 
input mass) over a range of magnitudes. By using the 
same parameters to fit the SEDs as those used to gen¬ 
erate the catalogs, we minimize the effect of other (free) 
parameters (TEST-1). Furthermore, by concentrating 
on individual codes, we avoid any code-dependent effect 
(Table 3). Taking the above points into account, we es¬ 
timate an uncertainty of cr(Alog(M)) = 0.134dex due to 
photometric errors. This dominates the error budget for 
galaxies with m(160W) > 26 (Figure 2a). 
Codes/Methods: This was specifically tested by gener¬ 
ating a simulated mock catalog (TEST-1) and constrain- 
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Table 9 

Error Budget for Stellar Mass Measurement 



rms in log(M/M q) 

Method 

0.136 

Systematic 

0.050 

Photometry 

0.135 

Numerical 

0.045 

log(Age) 

E(B-V) 

(< 0.3, 0.3-0.6, > 0.6) 

7-8 

(0.121,0.141,0.307) 

8-9 

(0.112,0.180,0.297) 

9-10 

(0.273,0.271,0.387) 

Free Parameters 

0.110 

Nebular lines 

0.300 

Combined Observational 

0.390 

Population Synthesis Models 

0.200 

Depth (near-IR photometry) 

< 0.200 

ing the input parameters, with the only free parameter 
being the code/method used. After subtracting the un¬ 
certainties due to photometric errors, we estimate the 
scatter in a(Alog(M)) among different codes in Table 4. 


We estimate an rms scatter of a(Alog(M)) = 0.136dex 
due to differences in methods/codes used. 

Age and Extinction: in order to disentangle the ef¬ 
fects of age and extinction and estimate their individ¬ 
ual contribution to the error budget, we constructed a 
covarience matrix with a(Alog(M)) as matrix elements 
measured in different age-extinction grids. All the other 
variables were kept fixed. Results are listed in Table 
6 and presented in Figure 5. The highest rms scatters 
were found for high extinction ( Eb~v > 0-6) and age 
(~ 10 9 ' 5 ) yrs values. 

Numerical/Systematics: even if all the above un¬ 
certainties are accounted for, we still have an inher¬ 
ent “base” error, independent from photometric, code¬ 
dependent and the degeneracies mentioned above. We 
estimate this to be cr(Alog(M)) = 0.047 dex (Figures 3 
& 4). 

Free Parameters: this is estimated by performing a 
realistic simulation where all the free parameters in the 
SED-fitting process were allowed to change (TEST-2A) 
and compared the results with a similar test where the 
parameters were kept fixed (TEST-2B). By subtracting 
(in quadrature) the rms estimates from TEST-2A and 
TEST-2B, we find a scatter among different methods, 
(due to free parameters), in the range a(Alog(M )) = 
0.037dex to 0.264dex (Table 7). The rms scatter associ¬ 
ated with free parameters from the median stellar mass 
values (from TEST-3A) is 0.110 dex, which is taken as 
our estimate of the uncertainty in the stellar mass mea¬ 
surement caused by free parameters. 

Combined Observational Uncertainties: using the 
observed data (TEST-3A), we measure the scatter in the 
estimated stellar mass values among different codes. This 
is estimated to be cr(Alog(M)) = 0.390 dex. 

Selection Wavelength and Photometric Depth: 


TEST-4 was formulated to address this and predicts a 
contribution < 0.2 dex in the total error budget due to 
the selection of the wavelength and photometric depth of 
the sample. 

Nebular Line Correction: In TEST-3 (based on the 
observational data), we compared the stellar mass esti¬ 
mates with and without correction for nebular emission 
(both line and continuum). We estimate an average error 
of 0.5 dex in the stellar masses due to contribution from 
nebular line emission. 

Population Synthesis Models: The templates used 
to measure stellar masses were generated by population 
synthesis models. We studied the effect of pulsating AGB 
stars on these templates and on the resulting stellar mass 
and find this to change the stellar mass by ~ 0.2 dex. 

10. SUMMARY AND CONCLUSIONS 

We performed a detailed study of the errors and main 
sources of uncertainty in stellar mass measurement in 
galaxies. Generating simulated galaxy catalogs with 
known input parameters (redsliift, mass, SEDs), we in¬ 
vestigated deviations in the estimated stellar mass from 
their input values (A log(M)) and its dependence on the 
observable parameters. The stellar masses were mea¬ 
sured by ten independent methods/codes with the results 
compared. Conclusions from this study are summarized 
below: 

• When the same set of input assumptions are used, 
no significant bias is found between different meth¬ 
ods. We find that the spread in the stellar mass 
of any given galaxy, using different methods is 
a(Alog(M)) = 0.136dex. Fainter galaxies with 
lower photometric S/N ratios (H > 26 mag) are 
responsible for most of this scatter. 

• When the same population synthesis models and 
parameters are used, the median of the stellar 
masses from different methods provides the small- 
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est rms scatter (with respect to the input stellar 
mass values) compared to individual methods. 

• We separated degeneracies in stellar mass measure¬ 
ments due to age and extinction and estimated 
their individual contribution to the total error bud¬ 
get. We find that the rms in stellar mass signifi¬ 
cantly increases for Eb~v > 0.6 for all ages. For 
any given method and extinction, there is an in¬ 
crease in the estimated stellar mass for ages > 10 8 ' 5 
years. 

• From our simulations we found that errors in the 
stellar mass and age are strongly correlated (galax¬ 
ies with large deviations in their stellar mass also 
show large deviations in age). A weaker trend is 
found with the extinction. 

• The effect of free parameters on stellar mass es¬ 
timates was studied using mock photometric cat¬ 
alogs with known input stellar mass. We find 
cr(A log(M)) = 0.136dex, caused by degeneracy 
and interplay between parameters. 

• The effects of population synthesis models and cor- 
rction for nebular emission were investigated and 
found to change the stellar mass (A (log(M))) by 
0.2 dex and 0.3 dex respectively. 
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Appendix I: Definitions of Stellar Mass in Galax¬ 
ies 

There are three different definitions of the stellar mass 
commonly used in literature. 

1. Stellar mass is built up over time by star formation ac¬ 
tivity in galaxies, with stellar mass recycling ignored. If 
0(t) dt is the stellar mass generated in a galaxy between 
time t and t+At with a star formation rate 0(t), the stel¬ 
lar mass over the age of the galaxy is M int = f^ g 0(f) dt, 
where t g is the current age of the galaxy. In this case the 
stellar mass depends on the SFH of the galaxy. Assuming 
an exponentially declining SFR, SFR(t) = SFR 0 e - t / T , 
the stellar mass is therefore calculated for each object as 

M int = rSFRoie t/r - 1) 

where SFRq is the SFR at t = 0 and r is the SFR time 
scale. 

2. Stellar mass recycling is taken into account using the 
‘instantaneous recycling approximation’. In this case a 
fixed fraction of the mass that goes into stars is returned 
to the Inter-Stellar Medium (ISM) immediately in each 
timestep to take into account the stellar mass loss in su¬ 
pernovae explosion or stellar winds. At any given time 
interval, At, the increment of the stellar mass is the star 
formation rate minus the mass fraction of the short-lived 
stars and stellar winds multiplied by the time interval. 
Therefore, the stellar mass of a galaxy at the age of t g , as¬ 
suming Instantaneous Re-cycling Approximation, Mi ra , 
is estimated as 

Mira = f (0(1) - 0{t)R re ) dt = (1 - R re ) f 0(f) dt 

Jo Jo 

where 0(f) is the SFR at time t and R re is the recy¬ 
cling fraction, which is set to be a constant and depends 
on the IMF. Most SAMs adopt this prescription. 

3. Stellar mass recycling is treated using detailed pre¬ 
dictions of stellar population models for how much mass 
is returned from a stellar p opulat ion of a given age in 
each timestep (e.g. iLu et al.ll2014 T The stellar mass of 
a galaxy at the current age t g depends on the star for¬ 
mation history and the mass loss from all stars formed 
in the past and is estimated as 

M* = / [0(f) - 4>(t)R re (t g - t )] dt 

Jo 

where R re {t g — t) is the recycling fraction at time t g 
for the stellar mass formed at time t. The stellar mass 
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of galaxies strongly depends on their SFH, with the 
recycled mass mainly depending on the IMF and age, 
with a secondary dependence on the metallicity of the 
stellar population. We show in section 6 that the stellar 
mass of galaxies weakly depends on the stellar mass loss. 


Appendix II: TEST-1 and TEST-2 Simulated 
Catalogs 


To generate the mock catalog to be as close as possi¬ 
ble to the observed data, we first predict the observed 
1-dimensional distributions of the expectation values for 
each of the main parameters (redshift, age, stellar mass 
and extinction). This is done by using a sample of galax¬ 
ies in GOODS-S with available spectroscopic redsliifts 
and by fitting their SEDs to model templates gener¬ 
ated from BC03. For each parameter, we generated the 
1-dimensional distribution for the observed parameters 
and fitted them to analytic functions (i.e. Gaussian). 
We then drew a mock sample of 1000 galaxies from this 
distribution (with their associated multi-waveband pho¬ 
tometry) and only retained those with (a), ages between 
lOMyr and the age of the universe at the redshift of the 
galaxy and (b). with 0 < E(B — V) < 1. The final 
sample selected for TEST-1 mock catalog satisfies these 
criteria, with the total number of galaxies adjusted to be 
similar to the spectroscopic sample in GOODS-S. This 
test therefore contains 559 simulated points. The dis¬ 
tribution of the main parameters in TEST-1 catalog are 
presented in Figure 19. The redshift distribution here 
closely resembles the observed distribution for galaxies 
with spectroscopic redshifts in the GOODS-S field. 

While a larger mock catalog (in terms of the 
number of galaxies generated) would reduce the 
poisson noise in the analysis, we aimed for a cat¬ 
alog which contains similar number of galaxies as 
those in the observed spectroscopic catalog. This 
allows a more realistic estimate of the stellar mass 
calibration errors when applying the results from 
the mock data to the real data. 

For TEST-2, light cones were used to directly 
replicate CANDELS field geometry. The N(z) 
for this model is generated to closely resem¬ 
ble the photometric redshift distribution for the 
GOODS-S field, as shown in Figure 20. In 
this test, stellar mass re-cycling is treated using 
the “instantaneous” recycling approximation in 
which a fixed fraction of mass that goes into stars 
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Figure 19. Distribution of physical parameters (redshift, stellar mass, age and extinction) for the TEST-1 mock catalog. The redshift 
distribution is taken to be the same as the observed distribution in the spectroscopic sample used for training the mock catalog. 



Figure 20. Input redshift distribution for TEST-2 A mock catalog. 

is immediately returned into the ISM during each time step (Lu et al. 2014D . 


















